Fungal Zuotin Proteins Evolved from MIDA1-like

Fungal Zuotin Proteins Evolved from MIDA1-like Factors by
Lineage-Specific Loss of MYB Domains
Edward L. Braun and Erich Grotewold
Department of Plant Biology and Plant Biotechnology Center, Ohio State University
Proteins are often characterized by the presence of multiple domains, which make specific contributions to their
cellular function. While the gain of domains in proteins by duplication and shuffling is well established, domain
loss is poorly documented. Here, we provide evidence that domain loss has played an important role in the evolution
of protein architecture and function by demonstrating that fungal Zuotin proteins evolved from MIDA1-like proteins,
present in animals and plants, by complete loss of the carboxyl-terminal MYB domains. Phylogenetic analyses of
the DnaJ motif (the J domain) present in both Zuotin and MIDA1 proteins were complicated by the limited length
and profound differences in evolutionary rates exhibited by this domain. To rigorously examine J domain phylogeny,
we combined the nonparametric bootstrap with Monte Carlo simulation. This method, which we have designated
the resampled parametric bootstrap, allowed us to assess type I and type II error associated with these analyses.
These results revealed significant support for domain loss rather than domain gain or gene loss involving paralogs.
The absence of sequences related to the MIDA1 MYB domains in Saccharomyces cerevisiae further indicates that
the domains have been completely lost, consistent with known functional differences between Zuotin and MIDA1
proteins. These analyses suggest that the description of additional examples of complete domain loss may provide
a method to identify orthologous proteins exhibiting functional differences using genomic sequence data.
Introduction
Proteins often consist of multiple domains (Doolittle 1995; Henikoff et al. 1997), and evolutionary changes in the domain structures of these proteins have obvious functional implications. Indeed, domain structure
differences have been used to infer protein-protein interactions (Enright et al. 1999; Marcotte et al. 1999).
Similarity of domain structures has also been used as
information for assigning orthology (Huynen and Bork
1998), a method especially helpful for large-scale genomic comparisons (e.g., Rubin et al. 2000). Variation
in domain organization has largely been attributed to
domain shuffling, intramolecular duplications, the fusion
of distinct proteins to form multifunctional polypeptides,
and pathways leading to the acquisition of novel domains during evolution (Doolittle 1995; Henikoff et al.
1997; Li 1997).
In principle, domain loss provides an additional
mechanism by which diversity in domain organization
has been generated, but there have been few attempts to
document domain loss, in part because of the difficulties
associated with distinguishing domain loss from domain
gain or gene loss. Domain loss can reflect the fission of
specific sequence modules (Snel, Bork, and Huynen
2000), resulting in the dispersal of the domains present
in a single protein into two or more polypeptides. This
mechanism is conservative, since orthologs of each domain remain after the fission. Alternatively, specific sequence modules present in proteins may undergo deletion or sufficient sequence divergence in some lineages
Abbreviations: ML, maximum likelihood; MP, maximum parsimony; NJ, neighbor joining.
Key words: domain evolution, gene loss, protein evolution, statistical power analysis, comparative genomics, Monte Carlo simulation.
Address for correspondence and reprints: Edward L. Braun, Department of Plant Biology and Plant Biotechnology Center, 206 Rightmire Hall, 1060 Carmack Road, Ohio State University, Columbus,
Ohio 43210. E-mail: [email protected].
Mol. Biol. Evol. 18(7):1401–1412. 2001
q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
to be eliminated (e.g., Rodriguez-Monge, Ouzounis, and
Kyrpides 1998; Braun and Grotewold 1999). This form
of domain loss may be conservative, since it can occur
after gene duplications as part of functional divergence
between paralogs (e.g., Braun and Grotewold 1999). Alternatively, DNA sequences encoding specific domains
can be completely deleted in some lineages (complete
domain loss). This type of change might result in proteins with orthologous domains likely to have undergone
functional changes as profound as those typically characterizing paralogs with similar differences in domain
structure.
Unequivocal demonstration of complete loss of domains requires entire genome sequences, since the existence of paralogs with the ancestral domain organization must be excluded. Even when complete genome
sequences are available, it is important to examine the
phylogenetic relationships among sequences containing
the more broadly shared domain to establish whether the
change in domain organization reflects loss or gain.
However, the short lengths of many domains and the
high degree of divergence between many multidomain
proteins that exhibit differing domain organizations present substantial challenges for phylogenetic reconstruction (Nei, Kumar, and Takahashi 1998; Philippe and
Laurent 1998). Since proteins that have undergone complete domain loss are unlikely to be sampled from a
broad variety of organisms, limited taxon sampling may
also have an impact on the estimates of phylogenetic
relationships for these proteins.
The MIDA/Zuotin family of proteins provides an
interesting example of variation in the domain structure
associated with particular groups of organisms. MIDA1like proteins, found in animals and plants, are characterized by the presence of an amino-terminal J domain
embedded in a region of similarity to fungal Z-DNA
binding proteins (Zuotins), along with two carboxyl-terminal MYB domains (fig. 1). The J domain is present
in a large number of prokaryotic and eukaryotic proteins
1401
1402
Braun and Grotewold
FIG. 1.—Structure of MIDA1 and Zuotin proteins. A, Schematic representation of MIDA1 and Zuotin homologs, highlighting the relevant
domains. B, Alignment of the mouse MIDA1 (Shoji et al. 1995), budding yeast (Saccharomyces cerevisiae) Zuotin (Zhang et al. 1992), and
Arabidopsis F24K9.12 (MIDA1-like) proteins. Identical residues are indicated in black, gaps in the alignment are indicated with dashes, and
the positions of the J domain and the MYB domains are shaded.
(Kelley 1998), but the MYB domains are related to a
group of transcription factors that have been found almost exclusively in the eukaryotes (Ouzounis and Papavassiliou 1997). Thus, the MIDA1 homologs are multidomain proteins in which the ancient J domain, present
in both prokaryotes and eukaryotes, has been fused to
eukaryote-specific MYB domains.
The mouse MIDA1 protein was identified because
it interacts physically with the helix-loop-helix protein
Id1 (Shoji et al. 1995), and overexpression suggests a
role in the regulation of growth and proliferation. This
function is consistent with the identification of the human MIDA1 ortholog as an M-phase phosphoprotein
(MPP11; Matsumoto-Taniura et al. 1996) and the defect
in asymmetric cell division evident in Volvox carteri
GlsA (MIDA1) mutants (Miller and Kirk 1999). Thus,
the ancestral function of MIDA1 in animals and plants
is likely to involve the regulation of cell division, although the specific targets of this protein remain unknown. MIDA1-like proteins are absent from the fungi
despite a large number of phylogenetic analyses supporting an animal-fungal clade that excludes the plants
(reviewed by Braun et al. 1998; Baldauf 1999). However, the fungal Zuotins (Zhang et al. 1992) correspond
to MIDA1 homologs lacking the conserved carboxylterminal MYB domains. Thus far, Zuotins have not been
Domain Loss in Zuotin Proteins
found in the plant or animal kingdoms. Saccharomyces
cerevisiae Zuotin, encoded by the ZUO1 gene, is a ribosome-associated DnaJ (Yan et al. 1998) that exhibits
tRNA- and Z DNA-binding activities (Zhang et al. 1992;
Wilhelm et al. 1994). These and other findings (Yan et
al. 1998) suggest that Zuo1p is the DnaJ partner of the
Ssb-type Hsp70 proteins (Nelson et al. 1992), and thus
a component of the fungal translation machinery.
Here, we provide evidence that complete loss of
conserved domains is one mechanism by which diversity in the domain distribution of multidomain proteins
is established. By examining the origin of fungal Zuotin
homologs, we determined that these DnaJ proteins originated from MIDA1-like proteins, present in animals
and plants, by the complete loss of carboxyl-terminal
MYB domains. We present an estimate of phylogenetic
relationships among J domains—shared by MIDA1 proteins, Zuotins, and more distantly related DnaJ/Hsp40
homologs—using the results to distinguish between domain loss and more complex patterns of gene duplication and gene loss that could produce similar end results.
To examine the reliability of J domain phylogeny, we
employed the ‘‘resampled parametric bootstrap,’’ combining the nonparametric bootstrap (Felsenstein 1985)
with Monte Carlo simulation. This tool permitted us to
place our confidence in the phylogenetic reconstruction
in a substantially more rigorous framework, allowing the
impact of limited sequence length and unequal evolutionary rates on both type I and type II error to be assessed. Our findings provide the first described example
of complete domain loss, consistent with observed functional differences between Zuotins and MIDA1-like proteins, which we demonstrate to be orthologs. Domain
loss may represent a general mechanism underlying variation in the structure of multidomain proteins, and the
complete domain loss exemplified by the MIDA1/Zuotin
proteins may also contribute to functional differences
between proteins.
Materials and Methods
Selection and Alignment of Sequences
Homologs of the MIDA1 protein were identified
using BLASTP (Altschul et al. 1997), and the positions
of protein sequence motifs were established by searching the Pfam (Bateman et al. 2000). Sequences were
aligned using CLUSTAL W (Thompson, Higgins, and
Gibson 1994) followed by manual adjustment. Full
alignments are available on request.
The protein sequences used in these analyses were
as follows: (1) MIDA1 homologs from Arabidopsis
thaliana (AAF00659 [F24K9.12] and BAA98200
[K16F4.7]), Caenorhabditis elegans (AAB09157), Drosophila melanogaster (AAF51678), Homo sapiens
(CAA66913), Mus musculus (BAA09854), and Volvox
carteri (AAD26632); (2) Zuotin homologs from S. cerevisiae (CAA45156) and Schizosaccharomyces pombe
(CAB10796); (3) eukaryotic J domain proteins (closely
related to MIDA1/Zuotin) from A. thaliana (AAD12011
[T3K9.23] and T00641 [F3I6.4]), Babesia bovis
(AAC27389), D. melanogaster (AAF43627), Hevea
1403
brasiliensis (AAD120055), H. sapiens (Q99615), Nicotiana tabacum (AAD09516), S. cerevisiae (AAB67594),
and Salix gilgiana (BAA35121 and BAA76888); (4) eukaryotic J domain proteins (more distantly related to
MIDA1/Zuotin) from C. elegans (T15851 [C56C10.11],
T22648 [F54D5.8], and T24396 [T03F6.2]), D. melanogaster (Q24133 [DNAJ1] and AAF47301 [GH03108]),
H. sapiens (BAA88769 [DNAJ], BAA02656 [DNAJ-2],
NPp036398 [HSP40–3], and NPp006136 [HDJ1]), M.
musculus (NPp031895), Pisum sativum (T06594), Plasmodium falciparum (F71623 [PFB0090c] and G71610
[PFB0595w]), S. cerevisiae (NPp014172), and S. pombe
(T41362); (5) prokaryotic J domain proteins from Borrelia burgdorferi (F70181), Buchnera sp. (BAB12870),
Chlamydia pneumoniae (G72128), Escherichia coli
(P08622), Haemophilus ducreyi (P48298), Haemophilus
influenzae (P43735), Lactobacillus sakei (CAA06942),
Mycoplasma genitalium (P47265), Mycoplasma pneumoniae (P78004), Nitrosomonas europaea (O06431),
Rhodothermus marinus (AAD37973), Synechocystis sp.
(P50027), Thermus thermophilus (Q56237), and Thermotoga maritima (B72327).
Phylogenetic Reconstruction
Phylogenetic analyses using amino acid sequences
were conducted using a combination of neighbor joining
(NJ; Saitou and Nei 1987), maximum parsimony (MP),
and maximum likelihood (ML). Analyses conducted using the full alignment and after deleting regions that
were difficult to align yielded similar results. Results
obtained using the full alignment are reported.
NJ trees were inferred using either ML distance
estimates or p-distances (uncorrected distances) and
constraining branch lengths to be nonnegative, using either the appropriate option in PAUP* 4.0b4a (Swofford
2000) or nnneighbor (W. J. Bruno, Los Alamos National
Laboratories). ML distances were estimated using
protml from MOLPHY 2.3b3 (J. Adachi and M. Hasegawa, Institute of Statistical Mathematics, Japan) with
options ‘‘-D’’ (distance matrix) and ‘‘-df’’ (for PAM distances; Dayhoff, Schwartz, and Orcutt 1978). ML distance estimates under the BLOSUM model (Henikoff
and Henikoff 1992) were calculated using a modified
version of protml.
Phylogenetic reconstruction by MP used PAUP*
4.0b4a, weighting amino acid changes equally and collapsing branches if the minimum branch length was 0.
ML trees were estimated using protml and the PAM
model, which exhibited the best fit to the data (see Results and Discussion). MP tree searches in PAUP* used
10 random-addition sequence replicates and TBR branch
swapping. Amino acid substitutions at specific sites
were examined using the ‘‘trace character’’ option in
MACLADE 3.05 (Maddison and Maddison 1992), and
overall patterns of amino acid substitution were inferred
using the ‘‘state changes and stasis’’ option in MACLADE, counting unambiguous events only. Inferred amino
acid changes were classified as conservative (changes to
another amino acid within the same class, using the six
classes of amino acids described by Dayhoff, Schwartz,
1404
Braun and Grotewold
FIG. 2.—Phylogeny of the MIDA1/Zuotin proteins based on the Zuotin homology region. Branch lengths were estimated by maximumlikelihood using the PAM1G model of sequence evolution and are presented as expected amino acid substitutions per site (EAASS). Bootstrap
support based on neighbor joining of PAM distances (500 replicates) is presented above branches as percentages.
and Orcutt [1978]) or radical (changes to another class
of amino acid). For the PAM model, the expected numbers of amino acid substitutions in each class were estimated by simulations, as described below. Amongsites rate heterogeneity was examined using an eightcategory discrete approximation to a G distribution
(Yang 1994), as implemented in TREE-PUZZLE 4.0.2
(K. Strimmer, University of Oxford, and A. von Haeseler, Max Planck Institute for Evolutionary Anthropology). Improvements in model fit for nested models were
evaluated using the likelihood ratio test (Felsenstein
1988; Goldman and Whelan 2000), comparing the test
statistic (2d, where d is the difference between ln L values for each model) with the appropriate x2 or x̄2
distribution.
The reliability of phylogenetic analyses was assessed by nonparametric bootstrap resampling (Felsenstein 1985) using 500 replicates and adding sequences
in random orders for each replicate. MP bootstrap analyses used 10 random additions and no branch swapping.
To examine the probability of obtaining specific
bootstrap proportions, 500 data sets were simulated
based on the null hypothesis (H0) tree using PSeq-Gen
1.1 (Grassly, Adachi, and Rambaut 1997) and the
PAM1G model of evolution with parameters estimated
from the data assuming H0. (The H0 topologies, branch
lengths, and other simulation parameters are available
on request.) Then, the nonparametric bootstrap support
for the relevant group was estimated from each simulated data set as described above. The probability of
observing a given bootstrap value if H0 were correct
corresponds to the proportion of simulated data sets with
bootstrap values greater than or equal to the value, with
the critical bootstrap value given by the upper 5% of
this null distribution (5% probability of type I error).
The power of a specific analysis can be assessed using
simulations under the alternative hypothesis. Power was
calculated using the proportion of simulations under the
alternative hypothesis (HA) with bootstrap support in the
upper 5% of the distribution generated using the null
hypothesis (since simulations under HA with support under the critical bootstrap value would reflect cases of
type II error, and power 5 1 2 b, where b is the type
II error probability). Like the data for H0 simulations,
all data relevant to the HA simulations are available on
request. Unix shell scripts used to conduct these analyses are also available on request.
Results and Discussion
Primary Structure and Molecular Evolution of MIDA1
Homologs
The primary question we wanted to address was
whether the fungal Zuotins exhibit an orthologous or
paralogous relationship to MIDA1 proteins. Given the
large number of distinct analyses supporting the existence of an animal-fungal clade (Baldauf and Palmer
1993; Wainright et al. 1993; Nikoh et al. 1994; reviewed
by Braun et al. 1998; Baldauf 1999), an orthologous
relationship between MIDA1-type proteins and the Zuotins would suggest that the difference in their domain
organizations reflects domain loss. Searches of genomic
sequences from S. cerevisiae and S. pombe did not reveal the presence of any MYB domains closely related
to those present in the animal and plant MIDA1 homologs (data not shown). This indicates that the domain
loss event would be complete, rather than a more conservative gene fission.
To investigate evolutionary relationships among
MIDA1-like proteins from animals, plants, and fungi,
we estimated the phylogeny of these proteins using the
Zuotin homologous region (fig. 1B). These analyses revealed a high level of bootstrap support for each eukaryotic kingdom (fig. 2). In fact, this estimate of
MIDA1/Zuotin phylogeny is compatible with current estimates of animal phylogeny, which suggest the existence of a clade (the Ecdysozoa; Aguinaldo et al. 1997)
containing the arthropods (D. melanogaster) and nematodes (C. elegans). The only gene duplication implied
by this phylogeny occurred within the land plant lineage
after the divergence of the chlorophyte algae, resulting
in the A. thaliana MIDA1 homologs encoded by paralogous genes on chromosomes 3 (F24K9.12) and 5
(K16F4.7).
Domain Loss in Zuotin Proteins
Although other MIDA1 or Zuotin paralogs containing the J domain could not be identified, searches using
the MYB domains revealed the existence of proteins
containing these MYB domains related to those in
MIDA1 in both C. elegans and A. thaliana (data not
shown). In addition, a distinct class of plant MYB homologs, including the tomato I-box binding factor
(Rose, Meier, and Wienand 1999), are characterized by
the presence of a MIDA1-like MYB domain, in addition
to a more distantly related MYB domain. The existence
of these proteins can be explained by domain shuffling
and the more conservative form of domain loss in which
gene duplication precedes domain loss.
The estimate of MIDA1/Zuotin phylogeny obtained
using the Zuotin homology region (fig. 2) does not allow
the reconstruction of the pattern of evolution for the
MYB domains present in MIDA1 homologs, because
the tree cannot be rooted in the absence of an outgroup
containing the Zuotin region. However, this analysis
provided valuable information regarding evolutionary
relationships within this group and their patterns of sequence evolution. We found that accommodating the unequal amino acid composition of these proteins resulted
in a significant improvement in model fit (ln L 5
23,408.69 [proportional model]; ln L 5 23,549.09
[Poisson model]; P , 0.001). Use of the PAM model
of evolution (Dayhoff, Schwartz, and Orcutt 1978) to
accommodate differences in the probability for different
types of amino acid substitution resulted in additional
improvement (ln L 5 23,259.26) and further revealed
that the PAM model was better than the related JTT and
BLOSUM models (ln L 5 23,261.02 [JTT]; ln L 5
23,298.20 [BLOSUM]). The Zuotin region exhibits
modest among-sites rate heterogeneity, and the assumption that rates follow a G distribution (with shape parameter a 5 1.07) resulted in a significant improvement
to the PAM model of evolution (ln L 5 23,196.22; P
, 0.001). Similar results were obtained using the J domain alone, regardless of the specific set of sequences
analyzed (data not shown). Despite the improvement in
likelihood score when the PAM model of sequence evolution was used, all models (and methods of phylogenetic inference) supported the same relationships among
these sequences.
Two alternative positions for the root of the
MIDA1/Zuotin phylogeny (fig. 2) are compatible with
current estimates of eukaryotic phylogeny (reviewed by
Braun et al. 1998; Baldauf 1999). Placement of the root
between the plants and an animal-fungal clade (a in fig.
2) is compatible with organismal phylogeny in the absence of ancient gene duplications, and it is also suggested by midpoint rooting (data not shown). This placement suggests the complete loss of MYB domains in
MIDA1-like proteins during the evolution of the fungi.
Placement of the root between MIDA1 and Zuotin proteins (b in fig. 2) could be reconciled with current estimates of organismal phylogeny by postulating (minimally) the loss of the MIDA1 ortholog in the S. cerevisiae lineage along with independent losses of Zuotin
orthologs in the Ecdysozoa and the Arabidopsis lineage.
There is evidence for gene loss in S. cerevisiae, C. ele-
1405
gans, and D. melanogaster based on both large-scale
surveys (Aravind et al. 2000; Braun et al. 2000; Rubin
et al. 2000) and detailed analyses of specific gene families (Braun et al. 1998; Steele, Stover, and Sakaguchi
1999). Since members of at least one other class of
MYB homologs (the c-MYB-like transcription factors
found in plants and animals; Braun and Grotewold
1999) have been lost in S. cerevisiae and C. elegans, it
is especially evident that the gene loss model cannot be
dismissed. Alternatively, support for root b could indicate that previous estimates of eukaryotic phylogeny
supporting the existence of an animal-fungal clade are
incorrect (Wang, Kumar, and Hedges [1999] suggest that
support for this clade is overstated), although we believe
this is unlikely given the distinct lines of evidence supporting the animal-fungal clade (summarized by Braun
et al. 1998; Baldauf 1999).
Thus, the observed phylogeny and the absence of
MYB domains in the fungal Zuotins can be reconciled
with either gene loss or complete domain loss models.
We believe that the difficulties in distinguishing between
these two models with a high degree of confidence are
a major reason why examples of complete domain loss
have not previously been reported. Thus, it is essential
to precisely define the position of the root in the
MIDA1/Zuotin phylogeny (fig. 2).
Placement of the Root in the MIDA1/Zuotin Subgroup
The placement of the root for the MIDA1/Zuotin
phylogeny could be reliably inferred using an outgroup.
However, use of a DnaJ outgroup required limiting the
alignment to the much shorter region of sequence homology corresponding to the J domain, only 75 residues
in length (fig. 1B). To determine the position of the root
within the MIDA1/Zuotin subgroup and establish the
identity of the DnaJ protein most closely related to this
subgroup, we retrieved a large set of DnaJ homologs—
including prokaryotic sequences—and estimated the
phylogeny of J domains related to MIDA1 and Zuotin
(fig. 3).
In phylogenetic trees obtained by NJ of PAM distances estimated using the J domain alignment, there is
75% bootstrap support for an animal-fungal clade containing MIDA1 and Zuotin proteins (fig. 3). Lower levels of bootstrap support for an animal-fungal J domain
clade were evident when using NJ of p-distances and
equally weighted MP (table 1), although neither analysis
provided any support for placement of the root in position b. The data exhibited a good fit to the PAM1G
model of sequence evolution, suggesting that the best
method of phylogenetic estimation used was NJ of PAM
distances, which provided support for root a in figure
2. Indeed, this method appeared to represent the most
powerful test of this hypothesis (see below).
Since the exact set of sequences examined in phylogenetic analyses can have a profound effect on estimates of evolutionary relationships (Lecointre et al.
1993), we wanted to assess the impact of the specific
set of J domain sequences used. Analyses using a subset
of eukaryotic J domains that appeared more closely re-
1406
Braun and Grotewold
FIG. 3.—Phylogeny of J domains related to MIDA1 and Zuotin. Branch lengths were estimated by ML using the PAM1G model of
evolution. Bootstrap support based on neighbor joining of PAM distances (500 replicates) is indicated below branches when it exceeds 50%.
The MIDA1/Zuotin group is shaded for emphasis. Prokaryotic J domains are presented in capital letters, the subgroup used in the eukaryotic
data set are indicated with asterisks, and the rapidly evolving J domains (‘‘fast’’ sequences) are indicated with bold letters. The paralogous
MIDA1-like proteins from Arabidopsis thaliana are differentiated by the chromosomal locations of the genes encoding these proteins (chromosome 3 or chromosome 5).
Table 1
The Impact of Taxon Sampling and Different Methods of Phylogenetic Analysis on
Support for Grouping Animal MIDA1 Homologs and Fungal Zuotins
FAST
DATA SETa
Full. . . . . . . . . . . .
Eukaryotic . . . . . .
GAMMA
SHAPE
SEQUENCES
INCLUDED?b PARAMETERc
Yes
No
Yes
No
1.02
1.09
1.68
1.82
BOOTSTRAP SUPPORT
FOR
ad
NJ with
PAM
NJ with
BLOSUM
NJ with
p-Distances
MP
75.4
97.8
75.0
98.4
81.6
99.4
77.8
98.0
61.8
99.2
67.6
97.6
46.9
79.0
52.7
69.7
NOTE.—NJ 5 neighbor joining; MP 5 maximum parsimony.
a The full data set included all of the sequences listed in Materials and Methods; the eukaryotic data set included the
sequences in groups 1, 2, and 3 listed in Materials and Methods (also see fig. 3).
b Inclusion or exclusion of the divergent (fast) MIDA1 homologs from Volvox carteri and the Ecdysozoa and Zuotin
from Saccharomyces cervisiae.
c ML estimate of the shape parameter (a) of a G-distribution describing among-sites rare heterogeneity obtained using
the relevant data set.
d Bootstrap support for branch a, supporting a J domain clade containing animal MIDA1 homologs and fungal Zuotin
homologs, using the analyses listed.
Domain Loss in Zuotin Proteins
lated to the MIDA1 group (designated the eukaryotic
data set) were conducted, providing results virtually
identical to those obtained with the larger data set (table
1). Since the MIDA1 homologs from the Ecdysozoa (C.
elegans and D. melanogaster) and V. carteri and Zuotin
from S. cerevisiae appeared divergent regardless of the
region analyzed (see branch lengths in figs. 2 and 3),
we examined the impact of removing these rapidly
evolving (‘‘fast’’) sequences. Although there are clear
limitations imposed by the small number of MIDA1 and
Zuotin sequences available, these analyses substantially
increased bootstrap support for an animal-fungal J domain clade (table 1) for NJ (of both PAM and p-distances) and MP analyses, providing additional evidence
for root a.
Assessing Support for Placement of the Root Using
the Resampled Parametric Bootstrap
Placement of the root at position a in figure 2 suggests that fungal Zuotin proteins evolved from MIDA1like proteins through loss of the MYB domains, providing the first reported example of this type of domain
loss in a specific group of organisms. However, this
placement of the root is contingent on the 75% bootstrap
support for an animal-fungal J domain clade (fig. 3,
branch a). This level of bootstrap support is often considered fairly high, since a number of studies have demonstrated that the bootstrap provides underestimates of
the actual confidence level in a phylogenetic hypothesis
under a number of conditions (Zharkikh and Li 1992,
1995; Felsenstein and Kishino 1993; Hillis and Bull
1993). However, given the implications for the origin of
Zuotin proteins that root a has, we wanted to examine
support for an animal-fungal J domain clade in a more
rigorous fashion.
The complex relationship between bootstrap proportion and the probability that a specific clade exists
(Hillis and Bull 1993; Zharkikh and Li 1995; Huelsenbeck, Hillis, and Jones 1996) represents a frustrating aspect of bootstrap analyses. Since Monte Carlo simulation allows the analysis of a variety of informative statistics when their distributions cannot be obtained from
theory (Besag and Diggle 1977), we envisioned that it
might provide a solution to this problem. Indeed, Monte
Carlo simulation was one method used to determine the
tendency of the bootstrap to underestimate the support
for specific groups (Hillis and Bull 1993). Monte Carlo
simulation has also been used extensively to examine
bias in phylogenetic analyses (e.g., Huelsenbeck, Hillis,
and Jones 1996; Huelsenbeck 1998; Chang and Campbell 2000; Sanderson et al. 2000), so an additional benefit of this approach is its ability to determine whether
the position of the J domains from plant MIDA1 homologs might be driven by differences in their evolutionary rates.
To determine the probability of observing a bootstrap value supporting root a as extreme as that observed (fig. 3 and table 1) given the null hypothesis (H0)
of placing the root in position b, 500 data sets were
simulated assuming H0. Bootstrap support for monophy-
1407
ly of animal-fungal MIDA1/Zuotin J domains—the alternative hypothesis (HA) in this analysis—was then estimated for each simulation. Analyses of the eukaryotic
data set using NJ of PAM distances allowed us to reject
H0 (root position b) whether or not the fast sequences
were included (23 simulations $ 75% bootstrap support,
P 5 0.046 [all sequences]; 4 simulations $ 98% bootstrap support, P 5 0.008 [fast sequences removed]), although support was stronger when fast sequences were
removed. Similar results were obtained using the full
data set (P 5 0.06 [all sequences] and P 5 0.02 [fast
sequences removed]), although analyses using the
broader set of sequences required substantially longer
times to run, and the bootstrap support using all sequences was slightly lower than necessary for significance. Thus, depending on the specific data set analyzed,
the support for placement of the root of the MIDA1/
Zuotin subgroup in position a was usually significant
and sometimes highly significant. Since the methodology we employed involves nonparametric bootstrap resampling of data sets generated by Monte Carlo simulation (also called the parametric bootstrap), we call this
Monte Carlo approach the ‘‘resampled parametric
bootstrap.’’
The value of employing the resampled parametric
bootstrap in this study rather than applying a ‘‘rule-ofthumb’’ critical value for the bootstrap (such as 70%) is
revealed by differences in the critical bootstrap value
(the value defining the upper 5% of the null distribution)
calculated by simulation. For this analysis, the critical
value ranges from as little as 73% (for the eukaryotic
data set) to as much as 94% (for the full data set excluding fast sequences). Thus, the inclusion of different
sequences changed both the observed bootstrap proportions and the meaning of specific bootstrap values. This
variation in the correspondence between bootstrap values and confidence in specific clades given differences
in factors such as branch lengths is consistent with the
results of previous simulations (Huelsenbeck, Hillis, and
Jones 1996), suggesting that application of the resampled parametric bootstrap to various problems can provide a much more rigorous test of support for specific
phylogenetic hypotheses.
The resampled parametric bootstrap can provide a
rigorous test for bias due to factors such as long-branch
attraction (Felsenstein 1978), allowing one to determine
whether long branches in a given phylogeny are long
enough to both attract and exhibit the level of bootstrap
support observed from the data. Since the H0 topology
is characterized by a very short internal branch uniting
the animals and the plants, as well as long branches to
the outgroup and plant MIDA1 subgroups (data not
shown), long-branch attraction might explain the observed support for HA (fig. 3 and table 1). Indeed, phylogenetic analyses of some data sets simulated assuming
H0 revealed relationships consistent with HA (table 2).
However, the observed bias did not appear to be extremely strong, with fewer than 60% of simulated replicates supporting HA in analyses based on either NJ of
corrected distances or MP. Since the resampled parametric bootstrap is a method of establishing expected
1408
Braun and Grotewold
Table 2
Bias in Phylogenetic Estimation for MIDA1/Zuotin Phylogenya
PERCENTAGES SUPPORTING HAc
FAST
DATA SETb
Full. . . . . . . . . . . . . . .
Eukaryotic . . . . . . . . .
SEQUENCES
INCLUDED?b
NJ with
PAM
NJ with
BLOSUM
NJ with
p-Distances
MP
Yes
No
Yes
No
38.4
51.2
31.4
52.2
37.2
51.4
34.2
46.6
65.6
86.2
66.2
92.6
30.2
55.4
37.4
57.8
NOTE.—NJ 5 neighbor joining; MP 5 maximum parsimony.
a Tendency of phylogenetic analyses to support H (monophyly animal MIDA1 fungal) when simulations assume H
A
0
(animal-plant monophyly).
b Members of each data set are indicated in figure 3 and table 1.
c Percentage of data sets simulated assuming H that have a clade consistent with branch a when analyzed using the
0
method indicated.
bootstrap proportions given a specific null hypothesis, it
provides a more conservative means to assess the impact
of long-branch attraction on phylogenetic analyses. In
fact, this approach was recently used specifically to examine potential bias in estimates of phylogeny based on
vertebrate rhodopsin sequences (Chang and Campbell
2000). The ability of the resampled parametric bootstrap
to provide information regarding the impact of bias on
phylogenetic analyses might represent an advantage
over other methods available for establishing the statistical significance of bootstrap support (e.g., Zharkikh
and Li 1995; Efron, Halloran, and Holmes 1996).
However, our analyses of J domain phylogeny suggest a need to extend the resampled parametric bootstrap
beyond analyses using simulation under H0, since analyses conducted using either MP or NJ of p-distances did
not provide significant support for HA (P 5 0.118 [MP]
and P 5 0.3 [NJ of p-distances]). Although these results
are consistent with the lower bootstrap proportions observed using these methods (table 1), they raise the
question of why only one method provided significant
support. To address this issue, 500 data sets were simulated assuming HA (monophyly of animal and fungal J
domains), and the proportion of bootstrap proportions
less than the critical value estimated by simulation under
H0 was found, since this corresponded to the type II
error probability. The power of a statistical analysis is
defined as the probability that it can reject the null hypothesis (Cohen 1977), i.e., one minus the probability
of type II error. We found that analyses using NJ of
PAM distances (power 5 0.614) were more powerful
than analyses using MP (power 5 0.442) or NJ of pdistances (power 5 0.3). The results of this dual simulation strategy suggest that the failure of analyses using
either MP or NJ of p-distances to reject H0 could reflect
the lower power of these analyses.
The incorporation of prior knowledge about types
of amino acid substitutions likely to occur during protein
evolution based on analysis of many of homologous
proteins, using evolutionary models such as the PAM
model (Dayhoff, Schwartz, and Orcutt 1978), may increase the power of phylogenetic analyses. However, it
is important to note that the current study assumed that
the PAM1G model of evolution was a sufficiently accurate representation of the actual process of J domain
evolution to provide useful estimates of both type I and
type II error for phylogenetic analyses using this region.
The PAM1G model of evolution had the best fit to the
J domain data (see above), and the observed proportion
of conservative amino acid substitutions in the J domain
was close to that expected under the PAM1G model
(see below), suggesting that it is a reasonable representation of J domain evolution despite the fact that the
actual process of amino acid substitution in proteins is
almost certainly more complex than the process assumed by the PAM1G model.
Despite these concerns regarding differences between the PAM1G model of sequence evolution and the
actual process of amino acid substitution for J domains,
we believe that the simulations conducted as part of this
study provide reasonable estimates of both type I and
type II error because the PAM model of evolution with
among-sites rate heterogeneity captures many notable
aspects of J domain evolution. In fact, the examination
of power using this dual simulation approach and the
resampled parametric bootstrap may provide valuable
information regarding the behavior of different methods
of phylogenetic estimation. Simulations have shown that
the best methods for examining specific phylogenetic
problems can be difficult to predict by evaluating model
fit alone. For example, NJ of p-distances may be more
efficient than NJ of distances estimated using a more
complex model even when the data were generated using the more complex model (Takahashi and Nei 2000).
Likewise, MP or ML using a simple model may also be
more efficient than ML using the true model for some
trees (Hillis, Huelsenbeck, and Swofford 1994; Yang
1997; Huelsenbeck 1998). Thus, NJ or ML analysis using the best-fitting model (based on ML analyses) may
not necessarily represent the best method of phylogenetic inference. Indeed, the dual simulation approach
can even provide information about the best method of
phylogenetic estimation when proteins exhibit patterns
of evolution that have not been implemented as models
in an ML framework.
Evaluating the probability of type II error for phylogenetic analyses using different models, distance
transformations, or weighting schemes could represent
an excellent method of choosing among different analyses. Likewise, the impact of changing the set of se-
Domain Loss in Zuotin Proteins
quences included in a phylogenetic analysis can be evaluated. We found a slight decrease in power for analyses
using the eukaryotic data set with the fast sequences
removed (power 5 0.552). This suggests that inclusion
of J domain sequences from the Ecdysozoa, S. cerevisiae, and V. carteri was beneficial for these analyses,
despite the increased bootstrap support for the animalfungal clade (table 1) and the clear rejection of H0 using
NJ of PAM distances after eliminating divergent J domain sequences from the fast taxa (see above). Furthermore, some of the fast J domain sequences were involved in rearrangements within the animal MIDA1 and
fungal Zuotin group, with some analyses suggesting that
S. cerevisae Zuotin is the outgroup of the animal and
fungal sequences (also note the very short branch uniting the fungi in fig. 3). The strong support for monophyly of the fungi and other eukaryotic kingdoms (fig.
2) suggests that the inference of different topologies for
the MIDA1/Zuotin J domains reflects the limited length
of the J domain. This further emphasizes the need for
caution in placing the root of this group and underscores
the need to conduct the simulations we used to examine
the position of the root. Analyses excluding the divergent J domain sequences exhibited a greater degree of
bias in phylogenetic reconstruction (table 2), probably
explaining the decrease in power.
Despite the limited power of all analytical approaches used, which highlighted the need for caution
when interpreting phylogenetic analyses of relatively
short sequence alignments, such as the J domain alignment analyzed here, we believe that the rejection of H0
using NJ of PAM distances provided evidence for root
a. The limited power of analyses using either MP or NJ
of p-distances also provided a reasonable explanation
for the inability of these analyses to support root a, and
it is important to note that none of these analyses supported root b. Taken as a whole, the analyses we conducted provided evidence that fungal Zuotin proteins
were derived from MIDA1-like proteins present in the
common ancestor of animals, plants, and fungi by complete loss of the carboxyl-terminal MYB domains.
Comparison Between Patterns of Evolution for the J
Domain and the PAM Model of Evolution
An often untested aspect of parametric tests of evolutionary hypotheses is the impact of model inaccuracy
on their results. Although the PAM model of sequence
evolution exhibited a better fit to the evolution of the J
domain than the other models tested (the JTT and
BLOSUM models; see above), an untested model that
fits the data even better may exist.
To examine the fit of the PAM model, the types of
amino acid changes that occurred in the J domain were
reconstructed by equally weighted parsimony. As we expected, a substantial bias toward conservative changes
was evident in the J domain region (213 of 416 unambiguous changes were conservative substitutions, using
the categories described by Dayhoff, Schwartz, and Orcutt [1978]). This number of conservative changes is
significantly greater than that expected under an uncon-
1409
strained model (60.3 conservative substitutions expected, x2 5 440.23, P , 0.001) or a model constrained to
amino acid substitutions involving single-nucleotide
substitutions (145.6 conservative substitutions expected,
x2 5 48.0, P , 0.001). However, the number of observed conservative changes (213) was only slightly
greater than expected under the PAM1G model of evolution (191 conservative substitutions expected, x2 5
4.63, P 5 0.032). Comparison of observed and expected
numbers of substitutions in each amino acid category
revealed that the deviation from the PAM model resulted
from greater numbers of conservative changes involving
the hydrophobic (55 observed and 27.3 expected, x2 5
30.13, P , 0.001) and aromatic (20 observed and 10.3
expected, x2 5 9.5, P 5 0.002) groups than expected
under the PAM1G model. The observed numbers of
substitutions in other categories did not differ from expectation under the PAM1G model (P . 0.1). Thus, the
PAM1G model of evolution appeared to exhibit a relatively good fit to the J domain data with the exception
of the excess substitutions in two specific amino acid
classes.
To examine the robustness of phylogenetic analyses
to the observed deviation from the PAM model, we examined bootstrap support for an animal-fungal J domain
clade (fig. 3, branch a) based on NJ of distance estimates obtained using a model of evolution based on the
BLOSUM matrix (Henikoff and Henikoff 1992). The
BLOSUM model was more tolerant of substitutions involving both hydrophobic and aromatic amino acids
than the PAM model, probably explaining the poorer fit
of this model to the data in ML analyses (see above).
Although NJ analyses using BLOSUM distances resulted in an increased probability of type II error (power 5
0.494), the observed level of bootstrap support was similar to that obtained using NJ of PAM distances (table
1), and we found significant support for the existence of
an animal-fungal J domain clade (P 5 0.046 with the
eukaryotic data set). Likewise, NJ of BLOSUM distances did not result in substantially increased bias in
phylogenetic reconstruction (table 2). These results suggest that the results of NJ of corrected distances for the
J domain are relatively robust to deviations between the
model of sequence evolution used to obtain distance estimates and the actual pattern of evolution, providing
additional support for our conclusions regarding the origin of fungal Zuotins by the process of complete domain loss in the fungal lineage.
Despite the support for root a provided by parametric approaches to phylogenetic analyses (see above),
we wanted to examine the amino acid changes that supported different placements of the root for MIDA1 and
Zuotin phylogeny. Thus, we looked for potential synapomorphies (shared derived character states) uniting either the animal MIDA1 and fungal Zuotin J domains
(root a) or the animal and plant MIDA1 J domains (root
b). Examination of ancestral state reconstructions for all
sites in the J domain alignment revealed the presence of
two sites (aligned to Val 124 and Lys 150 of the S.
cerevisiae Zuotin sequence) supporting root a and the
absence of sites supporting root b, consistent with our
1410
Braun and Grotewold
phylogenetic analyses supporting root a (fig. 3 and table
1). The first site supporting root a involved a change
from Ala in the plant and outgroup J domains to Val in
the animals and fungi, while the second involved a
change from Glu in the plant and outgroup J domains
to Lys in the animals and fungi. Volvox carteri GlsA
differs from animal, fungal, plant, and outgroup J domains at both of these sites (with Cys and Asp residues
aligned to Val 124 and Lys 150 of the S. cerevisiae
Zuotin sequence, respectively). Since V. carteri GlsA is
one of the divergent sequences (see branch lengths in
figs. 2 and 3), the existence of unique (autapomorphic)
character states in this sequence is not surprising. We
believe that the support for root a from parametric analyses, coupled with the absence of any sites potentially
supporting monophyly of J domains from animal and
plant MIDA1-like proteins, provides strong evidence for
our hypothesis that fungal Zuotin proteins evolved from
MIDA1-like ancestors through complete domain loss.
Additional Applications of the Resampled Parametric
Bootstrap
The use of a dual simulation approach to examine
both type I and type II error represents an excellent
method for examining specific questions in molecular
evolution. Previous simulation studies to examine power
in phylogenetic analyses have focused on establishing
the number of sites necessary to have a given probability
of reconstructing the correct topology (Hillis, Huelsenbeck, and Swofford 1994; Huelsenbeck, Hillis, and
Jones 1996). The dual simulation approach we employed here could extend this approach to power analysis by establishing the number of sites necessary to
reject a null hypothesis with specific type I and type II
error probabilities, such as the values used by convention in many studies (probability of type I error 5 0.05
and power 5 0.8; see Cohen 1977). However, we want
to emphasize that the dual simulation approach used
here is not limited to the resampled parametric bootstrap. Indeed, it could also be applied to tests that compare differences in ML, MP, or minimum evolution
(ME) scores with a null distribution generated by simulation (e.g., Swofford et al. 1996; a Monte Carlo approach called the ‘‘SOWH test’’ by Goldman, Anderson,
and Rodrigo [2000]).
The use of the resampled parametric bootstrap and
either NJ or MP with a fast search algorithm (e.g., the
methods used by Takahashi and Nei 2000) may have
benefits relative to the SOWH test for some phylogenetic problems. The SOWH test requires the identification of the optimal topology using ML, MP, or ME for
each simulated data set, which may not be tractable from
a computational standpoint for data sets with a large
number of sequences. Since increased taxon sampling
has been suggested to improve the accuracy of phylogenetic estimation in some cases (Lecointre et al. 1993;
Hillis 1996), the ability to conduct simulation-based
analyses of such large data sets is clearly desirable. In
addition, searches for optimal topologies may not result
in the identification of the true topology when applied
to data sets with a limited number of aligned sites, like
the J domain (fig. 1). In fact, simulation studies have
demonstrated that the true topology often has a slightly
suboptimal ML, MP, or ME score in analyses of short
alignments (see Nei, Kumar, and Takahashi 1998). Even
if the true topology has the optimal ML, MP, or ME
score, it may be one of multiple equally optimal topologies identified using these criteria when the alignment
is short. Thus, it has been suggested that phylogenetic
analyses of short alignments should focus on establishing support for specific groups rather than conducting
extensive tree searches in order to identify optimal topologies (Nei, Kumar, and Takahashi 1998). Since the
resampled parametric bootstrap uses the support for a
specific group as a test statistic, it may have advantages
over the SOWH test (which uses differences between
the values of optimality criteria as a test statistic) for
analyses of short alignments.
In this study, all simulations were focused on testing a single null hypotheses developed by combining
information about the domain structures of MIDA1 and
Zuotin proteins, previous estimates of eukaryotic phylogeny, and estimates of phylogeny obtained using the
larger Zuotin homology region. Multiple simulations
were used to assess the impact of changing both the set
of sequences analyzed and the parameters describing the
evolutionary process on our tests of this null hypothesis.
Thus, the null hypothesis was rejected in this study with
P-critical 5 0.05, with the results of different simulations interpreted as evidence that differences in taxon
selection and parameter estimates had limited impact on
the null distribution of bootstrap proportions. The similarities of the null distributions obtained with different
simulations suggest that our conclusions regarding J domain phylogeny are fairly robust (see above). However,
we stress the importance of conducting an appropriate
correction for global type I error (e.g., Rice 1989) when
multiple independent phylogenetic hypotheses are examined. Since corrections of type I error for conducting
multiple independent tests inflate type II error, conducting simulations under the relevant alternative hypotheses
in order to assess the power of phylogenetic analyses
will also provide valuable information when these corrections are used.
Conclusions
The role of domain loss in producing variation in
the distribution of domains in multidomain proteins is
not unexpected, and proteome scale studies of the distribution of multidomain proteins have suggested that
specific processes have acted to balance the formation
of multidomain proteins (Wolf et al. 1999). However,
the relative contributions of protein fission, domain loss
after gene duplication, and complete domain loss remain
unclear. In fact, the complete domain loss apparently
involved in the origin of Zuotin from MIDA1-like proteins can only be demonstrated using complete genome
sequence information for relevant organisms.
The limited sequence length and high degree of
divergence between homologous domains present in
Domain Loss in Zuotin Proteins
many multidomain proteins that exhibit different domain
structures present substantial challenges for the phylogenetic analyses required to distinguish between gene
loss, domain gain, and domain loss. The dual simulation
strategy described here allows examination of the impact
that both limited sequence length and high degrees of
sequence divergence have on type I and type II error in
phylogenetic estimation. This method could be combined with broader surveys of complete domain loss as
a useful strategy for the identification of proteins that
have undergone functional changes during evolution. As
genomic sequence data accumulate, it should become
possible to conduct these types of analyses in a framework that explicitly considers the probabilities of domain gain and loss, as well as the probabilities of gene
duplication and gene loss. Indeed, considering the possibility of complete domain loss may alter the interpretation of previous surveys of gene loss in particular
groups of organisms (e.g., Aravind et al. 2000; Braun
et al. 2000), since some putative gene loss events revealed by these surveys may correspond only to (complete) domain losses.
Acknowledgments
We thank Rebecca Kimball for helpful discussions
regarding both phylogenetic analyses and the use of
Monte Carlo simulation and for careful reading of this
manuscript. We are also grateful for the comments of
two anonymous reviewers, which improved this manuscript. This work was supported by the USDA (fellowship 1999-01582 to E.L.B), the National Science Foundation (grant MCB-9896111 to E.G.), and grants to E.G.
from Pioneer Hi-Bred International and the Ohio State
University Office of Research.
LITERATURE CITED
AGUINALDO, A. M. A., J. M. TURBEVILLE, L. S. LINFORD, M.
C. RIVERA, J. R. GAREY, R. A. RAFF, and J. A. LAKE. 1997.
Evidence for a clade of nematodes, arthropods and other
moulting animals. Nature 387:489–493.
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHÄFFER, J. H. ZHANG,
Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.
ARAVIND, L., H. WATANABE, D. J. LIPMAN, and E. V. KOONIN.
2000. Lineage-specific loss and divergence of functionally
linked genes in eukaryotes. Proc. Natl. Acad. Sci. USA 97:
11319–11324.
BALDAUF, S. L. 1999. A search for the origins of animals and
fungi: comparing and combining molecular data. Am. Nat.
154:S178–S188.
BALDAUF, S. L., and J. D. PALMER. 1993. Animals and fungi
are each other’s closest relatives: congruent evidence from
multiple proteins. Proc. Natl. Acad. Sci. USA 90:11558–
11562.
BATEMAN, A., E. BIRNEY, R. DURBIN, S. R. EDDY, K. L.
HOWE, and E. L. L. SONNHAMMER. 2000. The PFAM protein families database. Nucleic Acids Res. 28:263–266.
BESAG, J., and P. J. DIGGLE. 1977. Simple Monte Carlo tests
for spatial patterns. Appl. Stat. 26:327–333.
1411
BRAUN, E. L., and E. GROTEWOLD. 1999. Newly discovered
plant c-myb-like genes rewrite the evolution of the plant
myb gene family. Plant Physiol. 121:21–24.
BRAUN, E. L., A. L. HALPERN, M. A. NELSON, and D. O. NATVIG. 2000. Large-scale comparison of fungal sequence information: mechanisms of innovation in Neurospora crassa
and gene loss in Saccharomyces cerevisiae. Genome Res.
10:416–430.
BRAUN, E. L., S. KANG, M. A. NELSON, and D. O. NATVIG.
1998. Identification of the first fungal annexin: analysis of
annexin gene duplications and implications for eukaryotic
evolution. J. Mol. Evol. 47:531–543.
CHANG, B. S. W., and D. L. CAMPBELL. 2000. Bias in phylogenetic reconstructions of vertebrate rhodopsin sequences.
Mol. Biol. Evol. 17:1220–1231.
COHEN, J. 1977. Statistical power analysis for the behavioral
sciences. Academic Press, New York.
DAYHOFF, M. O., R. M. SCHWARTZ, and B. C. ORCUTT. 1978.
A model of evolutionary change in proteins. Pp. 345–352
in M. O. DAYHOFF, ed. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver
Springs, Md.
DOOLITTLE, R. F. 1995. The multiplicity of domains in proteins.
Annu. Rev. Biochem. 64:287–314.
EFRON, B., E. HALLORAN, and S. HOLMES. 1996. Bootstrap
confidence levels for phylogenetic trees. Proc. Natl. Acad.
Sci. USA 93:13429–13434.
ENRIGHT, A. J., I. ILIOPOULOS, N. C. KYRPIDES, and C. A.
OUZOUNIS. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90.
FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:
401–410.
———. 1985. Confidence limits on phylogenies—an approach
using the bootstrap. Evolution 39:783–791.
———. 1988. Phylogenies from molecular sequences—inference and reliability. Annu. Rev. Genet. 22:521–565.
FELSENSTEIN, J., and H. KISHINO. 1993. Is there something
wrong with the bootstrap on phylogenies—a reply. Syst.
Biol. 42:193–200.
GOLDMAN, N., J. P. ANDERSON, and A. G. RODRIGO. 2000.
Likelihood-based tests of topologies in phylogenetics. Syst.
Biol. 49:652–670.
GOLDMAN, N., and S. WHELAN. 2000. Statistical tests of gamma-distributed rate heterogeneity in models of sequence
evolution in phylogenetics. Mol. Biol. Evol. 17:975–978.
GRASSLY, N. C., J. ADACHI, and A. RAMBAUT. 1997. PSeqGen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees. Comput.
Appl. Biosci. 13:559–560.
HENIKOFF, S., E. A. GREENE, S. PIETROKOWSKI, P. BORK, T. K.
ATTWOOD, and L. HOOD. 1997. Gene families: the taxonomy of protein paralogs and chimeras. Science 278:609–
614.
HENIKOFF, S., and J. G. HENIKOFF. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci.
USA 89:10915–10919.
HILLIS, D. M. 1996. Inferring complex phylogenies. Nature
383:130–131.
HILLIS, D. M., and J. J. BULL. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182–192.
HILLIS, D. M., J. P. HUELSENBECK, and D. L. SWOFFORD. 1994.
Hobgoblin of phylogenetics? Nature 369:363–364.
HUELSENBECK, J. P. 1998. Systematic bias in phylogenetic
analysis: is the Strepsiptera problem solved? Syst. Biol. 47:
519–537.
1412
Braun and Grotewold
HUELSENBECK, J. P., D. M. HILLIS, and R. JONES. 1996. Parametric bootstrapping in molecular phylogenies: applications
and performance. Pp. 19–45 in J. D. FERRARIS and S. R.
PALUMBI, eds. Molecular zoology. Wiley-Liss, New York.
HUYNEN, M. A., and P. BORK. 1998. Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95:5849–5856.
KELLEY, W. L. 1998. The J-domain family and the recruitment
of chaperone power. Trends Biochem. Sci. 23:222–227.
LECOINTRE, G., H. PHILIPPE, H. L. V. LÊ, and H. LE GUYADER.
1993. Species sampling has a major impact on phylogenetic
inference. Mol. Phylogenet. Evol. 2:205–224.
LI, W.-H. 1997. Molecular evolution. Sinauer, Sunderland,
Mass.
MADDISON, W. P., and D. R. MADDISON. 1992. MacClade: analysis of phylogeny and character evolution. Sinauer, Sunderland, Mass.
MARCOTTE, E. M., M. PELLEGRINI, H. L. NG, D. W. RICE, T.
O. YEATES, and D. EISENBERG. 1999. Detecting protein
function and protein-protein interactions from genome sequences. Science 285:751–753.
MATSUMOTO-TANIURA, N., F. PIROLLET, R. MONROE, L. GERACE, and J. M. WESTENDORF. 1996. Identification of novel
M phase phosphoproteins by expression cloning. Mol. Biol.
Cell 7:1455–1469.
MILLER, S. M., and D. L. KIRK. 1999. GlsA, a Volvox gene
required for asymmetric division and germ cell specification, encodes a chaperone-like protein. Development 126:
649–658.
NEI, M., S. KUMAR, and K. TAKAHASHI. 1998. The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino
acids used is small. Proc. Natl. Acad. Sci. USA 95:12390–
12397.
NELSON, R. J., T. ZIEGELHOFFER, C. NICOLET, M. WERNERWASHBURNE, and E. A. CRAIG. 1992. The translation machinery and 70 kD heat shock protein cooperate in protein
synthesis. Cell 71:97–105.
NIKOH, N., N. HAYASE, N. IWABE, K. KUMA, and T. MIYATA.
1994. Phylogenetic relationship of the kingdoms Animalia,
Plantae, and Fungi, inferred from 23 different protein species. Mol. Biol. Evol. 11:762–768.
OUZOUNIS, C. A., and A. G. PAPAVASSILIOU. 1997. DNA-binding motifs of eukaryotic transcription factors. Pp. 1–21 in
A. G. PAPAVASSILIOU, ed. Transcription factors in eukaryotes. R. G. Landes, Austin, Tex.
PHILIPPE, H., and J. LAURENT. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616–623.
RICE, W. R. 1989. Analyzing tables of statistical tests. Evolution 43:223–225.
RODRIGUEZ-MONGE, L., C. A. OUZOUNIS, and N. C. KYRPIDES.
1998. A ferredoxin-like domain in RNA polymerase 30/40kDa subunits. Trends Biochem. Sci. 23:169–170.
ROSE, A., I. MEIER, and U. WIENAND. 1999. The tomato I-box
binding factor LeMYBI is a member of a novel class of
Myb-like proteins. Plant J. 20:641–652.
RUBIN, G. M., M. D. YANDELL, J. R. WORTMAN et al. (55 coauthors). 2000. Comparative genomics of the eukaryotes.
Science 287:2204–2215.
SAITOU, N., and M. NEI. 1987. The neighbor-joining method—
a new method for reconstructing phylogenetic trees. Mol.
Biol. Evol. 4:406–425.
SANDERSON, M. J., M. F. WOJCIECHOWSKI, J. M. HU, T. S.
KHAN, and S. G. BRADY. 2000. Error, bias, and long-branch
attraction in data for two chloroplast photosystem genes in
seed plants. Mol. Biol. Evol. 17:782–797.
SHOJI, W., T. INOUE, T. YAMAMOTO, and M. OBINATA. 1995.
MIDA1, a protein associated with Id, regulates cell growth.
J. Biol. Chem. 270:24818–24825.
SNEL, B., P. BORK, and M. HUYNEN. 2000. Genome evolution.
Gene fusion vs. gene fission. Trends Genet. 16:9–11.
STEELE, R. E., N. A. STOVER, and M. SAKAGUCHI. 1999. Appearance and disappearance of Syk family protein-tyrosine
kinase genes during metazoan evolution. Gene 239:91–97.
SWOFFORD, D. L. 2000. PAUP*. Phylogenetic analysis using
parsimony (*and other methods). Version 4.0b4a. Sinauer,
Sunderland, Mass.
SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, and D. M.
HILLIS. 1996. Phylogenetic inference. Pp. 407–514 in D. M.
HILLIS, C. MORITZ, and B. K. MABLE, eds. Molecular systematics. Sinauer, Sunderland, Mass.
TAKAHASHI, K., and M. NEI. 2000. Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol.
Biol. Evol. 17:1251–1258.
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.
CLUSTAL W—improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673–4680.
WAINRIGHT, P. O., G. HINKLE, M. L. SOGIN, and S. K. STICKEL.
1993. Monophyletic origins of the metazoa: an evolutionary
link with fungi. Science 260:340–342.
WANG, D. Y. C., S. KUMAR, and S. B. HEDGES. 1999. Divergence time estimates for the early history of animal phyla
and the origin of plants, animals and fungi. Proc. R. Soc.
Lond. B Biol. Sci. 266:163–171.
WILHELM, M. L., J. REINBOLT, J. GANGLOFF, G. DIRHEIMER,
and F. X. WILHELM. 1994. Transfer RNA binding protein in
the nucleus of Saccharomyces cerevisiae. FEBS Lett. 349:
260–264.
WOLF, Y. I., S. E. BRENNER, P. A. BASH, and E. V. KOONIN.
1999. Distribution of protein folds in the three superkingdoms of life. Genome Res. 9:17–26.
YAN, W., B. SCHILKE, C. PFUND, W. WALTER, S. KIM, and E.
A. CRAIG. 1998. Zuotin, a ribosome-associated DnaJ molecular chaperone. EMBO J. 17:4809–4817.
YANG, Z. 1994. Maximum likelihood phylogenetic estimation
from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306–314.
———. 1997. How often do wrong models produce better
phylogenies? Mol. Biol. Evol. 14:105–108.
ZHANG, S., C. LOCKSHIN, A. HERBET, E. WINTER, and A. RICH.
1992. Zuotin, a putative Z-DNA binding protein in Saccharomyces cerevisae. EMBO J. 11:3787–3796.
ZHARKIKH, A., and W.-H. LI. 1992. Statistical properties of
bootstrap estimation of phylogenetic estimation of phylogenetic variability from nucleotide sequences. II. Four taxa
without a molecular clock. J. Mol. Evol. 35:356–366.
———. 1995. Estimation of confidence in phylogeny: the
complete-and-partial bootstrap technique. Mol. Phylogenet.
Evol. 4:44–63.
ELIZABETH KELLOGG, reviewing editor
Accepted March 23, 2001