Functional Divergence Prediction from Evolutionary Analysis: A

Functional Divergence Prediction from Evolutionary Analysis: A Case
Study of Vertebrate Hemoglobin
Simonetta Gribaldo,1 Didier Casane,1 Philippe Lopez, and Hervé Philippe
Phylogénie, Bioinformatique et Génome, Université Pierre et Marie Curie, Paris, France
Introduction
The prediction and analysis of functional subtypes in
protein families is one of the highest priorities of
postgenomics studies (Eisenberg et al. 2000; Enright and
Ouzounis 2001; Bickel et al. 2002; Gaucher et al. 2002b;
Jensen, Skovgaard, and Brunak 2002; Lichtarge and Sowa
2002; Yang 2002; Blouin, Boucher, and Roger 2003).
Besides predicting specialization in subgroups of homologs, it is of the utmost importance to devise in silico
methodologies aimed at identifying the residues involved
in such a process. This is just beginning to become
possible, given the large number of sequences necessary to
such analyses. Diverse approaches have been proposed,
mostly based on multiple alignments (Casari, Sander, and
Valencia 1995; Lichtarge, Bourne, and Cohen 1996;
Hannenhalli and Russell 2000). It is well established that
many aspects of comparative biology can benefit from
evolutionary studies (Felsenstein 1985), and evolutionary
information is very helpful in the prediction of gene
function (Eisen 1998; Marcotte et al. 1999; Pellegrini et al.
1999). Recently, molecular phylogenetics tools have been
used to tackle this issue via the identification of changes in
rates of substitutions over time, under the principle that
site-specific switches in evolutionary rates in a member of
a duplicated gene couple are a signature of functional
diversification (Gu 1999, 2001; Naylor and Gerstein 2000;
Gaucher, Miyamoto, and Benner 2001; Knudsen and
1
These authors contributed equally to this work.
Key words: covarion, evolutionary rate, hemoglobin, heterotachy,
protein function, tertiary structures.
E-mail: [email protected].
Mol. Biol. Evol. 20(11):1754–1759. 2003
DOI: 10.1093/molbev/msg171
Molecular Biology and Evolution, Vol. 20, No. 11,
Ó Society for Molecular Biology and Evolution 2003; all rights reserved.
1754
Miyamoto 2001; Gaucher et al. 2002a, 2002b; Blouin,
Boucher, and Roger 2003).
The switch of constraints on positions over time is
a poorly understood phenomenon. Indeed, although the
notion that not all sites in a protein are subjected to the same
evolutionary forces is well established (Kimura 1983), a site
can show dramatic changes in substitution rates on separate
parts of a phylogeny. Evidence for such behavior dates as
early as the 1970s, with the formulation of Fitch’s covarion
model of molecular evolution (Fitch and Markowitz 1970).
The term heterotachy (Greek for different speed) was
recently coined to refine the description of this phenomenon (Philippe and Lopez 2001), as opposed to homotachy,
which indicates a homogeneous substitution rate. Because
heterotachy (i.e., lineage-specific substitution rate shifts)
reflects constraint variation on specific sites of a protein
structure across time, it is generally indicated as a landmark
of functional divergence (Gaucher et al. 2002b). Under this
reasoning, the identification of heterotachous profiles
between paralogous genes would be potentially informative
for structure/function prediction analyses, because gene
duplication is a major source of functional innovation
(Ohno, Wolf, and Atkin 1968). Various approaches have
been applied to a number of paralogous families in order to
identify shifts in replacement rates that may be indicative of
their functional diversification after duplication (Gu 1999,
2001; Gaucher, Miyamoto, and Benner 2001; Knudsen and
Miyamoto 2001; Gaucher et al. 2002a, 2002b). Recently,
Naylor and Gerstein (2000) employed Gu’s coefficient of
functional divergence (Gu 1999) to identify shifts in
variability profiles between alpha and beta globins for
three groups of mammals as a measure of their specialization over evolutionary time. Because they observed
marked rate shifts between alpha and beta globins, but not
within each subunit, they concluded that this approach can
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013
It is a central assumption of evolution that gene duplications provide the genetic raw material from which to create
proteins with new functions. The increasing availability in multigene family sequences that has resulted from genome
projects has inspired the creation of novel in silico approaches to predict details of protein function. The underlying
principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in
paralogous proteins. It has been proposed that the positions that show switches in substitution rate over time—i.e.,
‘‘heterotachous sites,’’ are good indicators of functional divergence. Here, we analyzed the a and b paralogous subunits
of hemoglobin in search for such signatures. We found as many heterotachous sites in comparisons between groups of
paralogous subunits (a/b) as between orthologous ones (a/a, b/b). Thus, the importance of substitution rate shifts as
predictors of specialization between protein subfamilies might be reconsidered. Instead, such shifts may reflect a more
general process of protein evolution, consistent with the fact that they can be compatible with function conservation. As
an alternative, we focused on those residues showing highly constrained states in two sequence groups, but different in
each group, and we named them CBD (for ‘‘constant but different’’). As opposed to heterotachous positions, CBD sites
were markedly overrepresented in paralogous (a/b) comparisons, as opposed to orthologous ones (a/a, b/b), identifying
them as likely signatures of functional specialization between the two subunits. When superimposed onto the threedimensional structure of hemoglobin, CBD positions consistently appeared to cluster preferentially on inter-subunit
surfaces, two contact areas crucial to function in vertebrate tetrameric hemoglobin. The identification and analysis of
CBD sites by complementing structural information with evolutionary data may represent a promising direction for
future studies dealing with the functional characterization of a growing number of multigene families identified by
complete genome analyses.
Functional Divergence Prediction 1755
Methods
Sequence Retrieval, Alignment, and Taxon Sampling
Vertebrate globin homologs were obtained from
public databases (http://www.ncbi.nlm.nih.gov/) by using
the automatic retrieving program AliBaba (Lopez, unpublished), and manually aligned by the aid of the ED
program included in the MUST suite (Philippe 1993). We
discarded 7 alpha and 8 beta globin sequences, as they
belonged to poorly sampled monophyletic groups (chondrichthyans, coelacanths, dipnoi, amphibians). We retained a and b subunit sequences from three large
monophyletic groups, mammals, teleost fishes, and
Sauropsida (birds, crocodiles, lepidosaurs, and turtles).
After removal of ambiguously aligned regions, a data set
was assembled comprising 234 a and 213 b sequences
totaling 132 amino acid positions. The data set was
subdivided into six subalignments corresponding to either
a and b sequences from each taxonomic group (145 a
and 145 b from mammals; 46 a and 57 b from teleosts;
45 a and 49 b from Sauropsida). Alignments and
accession numbers are available as online Supplementary
Material.
Phylogenetic Analyses and Study of Site-Specific
Evolutionary Behavior
Heterotachy was tested as previously described
(Lopez, Casane, and Philippe 2002). Briefly, the number
of substitutions at each site was calculated on a phylogenetic tree obtained from each of the six subalignments
(corresponding to 3 a and 3 b sequence groups for each
taxonomic cluster). Substitution numbers were inferred by
maximum likelihood (ML) using PAML (Yang 1997) with
the JTT þ F þ ÿ model. Each position was described by
a profile indicating the numbers of inferred substitutions
for every group. The program HTACH (Philippe, unpublished) was then employed for all possible binary
comparisons to identify positions as either (1) homotachous, (2) heterotachous, (3) constant, and (4) constant
but different (CBD). Positions displaying less than a total
of three substitutions were classified as untestable, because
no statistical test can detect a difference in such cases (this
corresponds to the criterion ‘‘a total number of substitutions smaller than half the number of groups’’ used by
(Lopez, Forterre, and Philippe 1999)). Positions with only
one change in one terminal branch were considered for
categories (3) or (4), because this difference is likely the
result of sequencing error.
Structure Analyses
The different classes of sites were superimposed onto
three-dimensional hemoglobin structures retrieved from
the Protein Data Bank (Berman et al. 2000), by using the
user script option in RasMol (Sayle and Milner-White
1995). Side chain solvent accessibility was calculated by
the program Access (http://www.csb.yale.edu/userguides/
datamanip/access/access_descrip.html), but because various types of positions display similar values, this is not
discussed here (see table S1 in the online Supplementary
Material). Mutational data were retrieved from the Databases of Human Hemoglobin Variants and Other Resources
through the Globin Gene Server at http://globin.cse.
psu.edu/hbvar/menu.html (Hardison et al. 2002).
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013
successfully pinpoint functional differentiation. However,
because a very large number of sequences are needed to
assess site-specific rate shifts with statistical confidence
(Lopez, Casane, and Philippe 2002), these results might
have been biased by both a scarce sampling and limited
evolutionary distances. In addition to sites with replacement rate switches, highly constrained residues may also
hold information regarding functional divergence. For
instance, sites that switched state between two diverging
proteins, but that nonetheless conserved a high evolutionary constraint and are consequently not identified in
analyses of heterotachous positions, might be potentially
important in the process of functional specialization.
Although pinpointed by different approaches (Lichtarge,
Bourne, and Cohen 1996; Gu 2001), this type of site
remains poorly investigated in functional genomics studies.
We applied an approach coupling evolutionary and
structural knowledge to survey the sites potentially responsible for functional specialization in the two subunits
of vertebrate hemoglobin. Recently, such an approach was
employed to investigate the fine functional differences
between elongation factor homologs in bacteria and
eukarya (Gaucher, Miyamoto, and Benner 2001; Gaucher
et al. 2002a). Our choice was driven by the considerable
representation of hemoglobin sequences in molecular
databases, and by the vast amount of structural and
functional information available. Tetrameric vertebrate
hemoglobin consists of two identical a subunits of 141
residues and of two identical b subunits of 146 residues,
each containing one heme group. Subunits a and b are
paralogous peptides arising from an ancient gene duplication at the base of jawed vertebrates. Oxygen binding is
cooperative and is associated with a large shift in the
quaternary structure of the heterotetramer, from the deoxy
(T) to the oxy (R) forms, as one dimer rotates relative to the
other. Two types of subunit interfaces are implicated in
such transition, namely a1b1 (or a2b2) and a1b2 (or
a2b1), also referred to as packing and sliding surfaces,
respectively (Perutz 1970). During the transition from the T
to R state, the a1b2-subunit interface undergoes a dramatic
sliding movement, whereas the a1b1-subunit interface
remains essentially unchanged. Mutational studies have
demonstrated the importance of both inter-subunit and
intra-subunit contact regions for critical hemoglobin
properties such as oxygen affinity and cooperativity
(Shionyu, Takahashi, and Go 2001). We sought to identify
and study the set of positions potentially implicated in the
development of such highly functional interactions over the
divergence of a and b subunits. We first studied heterotachous positions and found a similar level of variabilityprofile shift between orthologs and between paralogs,
suggesting that heterotachous positions may be poor
signatures of functional divergence. Then we turned instead
to examining more constrained positions, which appeared
to be much more reliable predictors.
1756 Gribaldo et al.
Results and Discussion
FIG. 1.—Distribution of heterotachous (H) and conserved but
different (CBD) sites in paired orthologous and paralogous comparisons.
(a) Schematic phylogenetic representation of a and b vertebrate globin
families. The numbers of taxa used for each group are given in
parentheses. (b) Distribution of H sites in orthologous (open circles) and
in paralogous (solid circles) comparisons. Values on the Y axis indicate
numbers of sites, and values on the X axis are the average Kimura
distances between the two corresponding groups. (c) Distribution of CBD
sites in orthologous (open circles) and in paralogous (solid circles)
comparisons. Values on Y and X axes are as in (b).
1991), whereas the a55 homolog does not appear to be
involved in any essential interaction.
Nevertheless, these examples should be considered as
special cases in which substitution rate shifts are linked to
function. In fact, the similar levels of heterotachy observed
in both ortholog (i.e., no functional divergence) and
paralog (i.e., functional divergence) comparisons (fig.
1a), coupled with the fact that H sites were evenly
dispersed over the three-dimensional structure (fig. 2a),
indicate that, overall, the use of heterotachy as a reliable
indicator of functional divergence should be reconsidered.
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013
We retrieved and aligned a large sample of a and
b globins from three vertebrate lineages—mammals,
Sauropsida (reptiles and birds), and teleost fishes, totaling
487 taxa. This data set was subdivided into six sequence
groups (corresponding to a and b globin clusters from each
lineage), and evolutionary rates at each site were determined in each group by maximum likelihood (Yang
1997). By comparing these rates between groups of either
paralogous and orthologous sequences (fig. 1a), we were
thus able to follow the evolutionary behavior of each site in
both a and b globins over time. Because our goal was to
identify sites important in the functional divergence of the
two globins, while we considered orthologous comparisons
as a negative control, we looked for those sites harboring
different properties in paralogous groups of sequences.
We first studied heterotachous (H) sites, because
a number of analyses have suggested that changes in
variability profiles over time harbor a strong signal of
functional differentiation (Gu 1999, 2001; Naylor and
Gerstein 2000; Gaucher, Miyamoto, and Benner 2001;
Knudsen and Miyamoto 2001; Gaucher et al. 2002a,
2002b; Blouin, Boucher, and Roger 2003). We followed the
frequency of H sites over the evolution of a and b globins
(fig. 1b). Unexpectedly, we found similar proportions of H
positions in both orthologous and paralogous comparisons,
with a mean of ;30% and ;34%, respectively. The
nonparametric Mann-Whitney rank test suggested that the
difference was not significant (w ¼ 40; P , 0.45). In fact,
the highest value of heterotachy in paralogous comparisons
is likely due to the greater evolutionary distances (Lopez,
Casane, and Philippe 2002). Sometimes, the level of
heterotachy was even higher between orthologs than
between paralogs. For example, we found 45 H sites in
orthologous comparisons between mammals and teleosts,
but only 21 in paralogous comparisons within teleosts.
The significance of these findings was investigated
further by observing the structural distribution of the 40 H
positions identified in mammalian paralogous comparisons
(fig. 2a; for a detailed list, see table S2 in the online
Supplementary Material). These positions appeared evenly
dispersed all over the structure, both at internal and
external locations (fig. 2a). Within the pool of H positions
there were some likely to hold high functional significance, as they presented strong constraints in one subunit
and much higher variability in the other. Consistently, the
function of such residues was critical to only one chain
(see table S2 in the online Supplementary Material). This
was the case of six positions lying at inter-subunit contact
surfaces. For example, leucine a40 presented no substitutions over the whole mammalian tree, whereas its b39
homolog was much more variable. This site is crucial in
the a chain as it interacts with histidine b146 at the sliding
interface. Instead, we found no functional indication for
residue b39. Similarly, position b60 displayed a remarkably conserved valine over the whole mammalian tree,
whereas its a homolog switched to variable amino acids on
different branches. A valine to glutamate mutation at this
site is reported to lead to a highly unstable b globin
responsible for a severe form of thalassemia (Podda et al.
Functional Divergence Prediction 1757
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013
FIG. 2.—Distribution of mammalian heterotachous and constant but
different (CBD) sites onto human hemoglobin 3D structure. Sites
belonging to a subunits (A and D chains) are colored in yellow and
green, respectively; sites belonging to b subunits (B and D chains) are
colored in blue and magenta, respectively; hemes are in red. H (a) and
CBD (b) sites are displayed with the spacefill option, whereas only the
backbone of the molecules is shown. The structure presented is a human
adult hemoglobin in the deoxy conformation (Tame and Vallone 2000),
with PDB accession number: 1A3N. The four types of categories
(constant, homotachous, heterotachous, and CBD) between mammalian
hemoglobin a and b are shown in fig. S1 (see Supplementary Material
online).
This might be the case for a number of previous studies
(Gu 1999, 2001; Gaucher, Miyamoto, and Benner 2001;
Knudsen and Miyamoto 2001; Gaucher et al. 2002a,
2002b), and for a recent analysis of globins (Naylor and
Gerstein 2000). However, compared to this analysis, our
substitution rate estimates were more accurate because
they were calculated by using about four times as many
sequences and much larger evolutionary distances (i.e.,
a broader taxonomic sampling). Moreover, none of the
above studies challenged the reciprocal hypothesis, i.e.,
that heterotachy is absent when function is preserved.
Indeed, the opposite was recently demonstrated when
;95% of variable positions in vertebrate cytochrome
b were found to be heterotachous, using a sample of 2,000
sequences (Lopez, Casane, and Philippe 2002), even though
a functional change of this enzyme within vertebrates is
unlikely. Then, although heterotachy certainly harbors
a strong functional component, this may not be specifically
related to functional divergence. Instead, it may more
generally reflect a less specific process related to the many
intramolecular and intermolecular interactions compatible
with a range of equally viable protein conformations. This
latter hypothesis is consistent with Fitch’s pioneering
theory on the non-independence of substitutions in proteins
(Fitch and Markowitz 1970).
To find more genuine signatures of functional
divergence between the two globin subunits, we turned
to positions harboring strong evolutionary constraints in
both paralogs. Among them, we selected those that
displayed different amino acid states in each paralog.
Accordingly, we named them ‘‘conserved but different’’
(CBD). As shown in figure 1c, we found that, in contrast to
H sites, CBD sites were overrepresented in paralogous
comparisons with respect to orthologous ones, with a mean
of ;10% and ;2%, respectively. The nonparametric
Mann-Whitney rank test suggested that the difference was
significant (w ¼ 23; P , 0.002). For example, although
a total of 15 and 13 such positions were identified in
paralogous comparisons in Sauropsida and mammals,
respectively, only one was found in orthologous comparisons between Sauropsida and mammals (see fig. 1c). This
evidence seems to indicate a likely involvement of CBD
positions in the specialization of the two globin families.
To confirm this prediction, we studied the distribution
of CBD sites onto the hemoglobin quaternary structure.
When superimposed onto the 3D structure of human adult
hemoglobin, the 13 CBD sites identified in mammalian
paralogous comparisons (for a detailed list, see table S2 in
the online Supplementary Material) were concentrated at
non-exposed locations (fig. 2b). This concentration was
confirmed by the fact that almost all of them (10/13) were
indeed reported to occupy contact surfaces (see table S2 in
the online Supplementary Material), such as central cavity,
ligand binding pockets, and inter-subunit contacts. In
particular, six CBD sites were directly involved in both
a1b2 (sliding) and a1b1 (packing) interfaces (Perutz 1970;
Shionyu, Takahashi, and Go 2001). For example, tyrosine
a41 and its homolog arginine b40 were identified as
a highly constrained CBD couple in mammals. These sites
interact with each other at the sliding surface in the oxy
state. Another case is that of arginine a141 and its
1758 Gribaldo et al.
Supplementary Material
The following material is available online: figure S1:
Distribution on human hemoglobin 3D structure of the four
evolutionary categories of mammalian globins sites, H,
CBD, homotachous, and constant; table S1: Solvent
accessibility and site categories for all the positions of
mammalian globins; table S2: Structural and mutational
data for mammalian CBD and H sites. Also, alignments and
accession numbers are available as online Supplementary
Material.
Acknowledgments
We acknowledge Pierre Tuffery for kindly calculating side chain solvent accessibility, and Eric Bapteste and
Franz Lang for careful reading of the manuscript. S.G. was
supported by a poste de chercheur associé from CNRS.
This work was supported by a grant from the programme
inter-EPST bioinformatique.
Literature Cited
Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat,
H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The
Protein Data Bank. Nucleic Acids Res. 28:235–242.
Bickel, P. J., K. J. Kechris, P. C. Spector, G. J. Wedemayer, and
A. N. Glazer. 2002. Finding important sites in protein
sequences. Proc. Natl. Acad. Sci. USA 99:14764–14771.
Blouin, C., Y. Boucher, and A. J. Roger. 2003. Inferring functional constraints and divergence in protein families using 3D
mapping of phylogenetic information. Nucleic Acids Res. 31:
790–797.
Casari, G., C. Sander, and A. Valencia. 1995. A method to predict
functional residues in proteins. Nat Struct Biol 2:171–178.
Eisen, J. A. 1998. Phylogenomics: improving functional
predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163–167.
Eisenberg, D., E. M. Marcotte, I. Xenarios, and T. O. Yeates.
2000. Protein function in the post-genomic era. Nature
405:823–826.
Enright, A. J., and C. A. Ouzounis. 2001. Functional associations
of proteins in entire genomes by means of exhaustive
detection of gene fusions. Genome Biol. 2:Research0034.
Felsenstein, J. 1985. Phylogenies and the comparative method.
Am. Nat. 125:1–15.
Fitch, W. M., and E. Markowitz. 1970. An improved method for
determining codon variability in a gene and its application to
the rate of fixation of mutations in evolution. Biochem. Genet.
4:579–593.
Gaucher, E. A., U. K. Das, M. M. Miyamoto, and S. A. Benner.
2002a. The crystal structure of eEF1A refines the functional
predictions of an evolutionary analysis of rate changes among
elongation factors. Mol. Biol. Evol. 19:569–573.
Gaucher, E. A., X. Gu, M. M. Miyamoto, and S. A. Benner.
2002b. Predicting functional divergence in protein evolution
by site-specific rate shifts. Trends Biochem. Sci. 27:315–321.
Gaucher, E. A., M. M. Miyamoto, and S. A. Benner. 2001.
Function-structure analysis of proteins using covarion-based
evolutionary approaches: elongation factors. Proc. Natl. Acad.
Sci. USA 98:548–552.
Gu, X. 1999. Statistical methods for testing functional divergence
after gene duplication. Mol. Biol. Evol. 16:1664–1674.
———. 2001. Maximum-likelihood approach for gene family
evolution under functional divergence. Mol. Biol. Evol.
18:453–464.
Hannenhalli, S. S., and R. B. Russell. 2000. Analysis and
prediction of functional sub-types from protein sequence
alignments. J. Mol. Biol. 303:61–76.
Hardison, R. C., D. H. Chui, B. Giardine, C. Riemer, G. P.
Patrinos, N. Anagnou, W. Miller, and H. Wajcman. 2002.
HbVar: a relational database of human hemoglobin variants
and thalassemia mutations at the globin gene server. Hum.
Mutat. 19:225–233.
Jensen, L. J., M. Skovgaard, and S. Brunak. 2002. Prediction of
novel archaeal enzymes from sequence-derived features.
Protein Sci. 11:2894–2898.
Kimura, M. 1983. The neutral theory of molecular evolution,
Cambridge University Press, Cambridge.
Knudsen, B., and M. M. Miyamoto. 2001. A likelihood ratio test
for evolutionary rate shifts and functional divergence among
proteins. Proc. Natl. Acad. Sci. USA 98:14512–14517.
Lichtarge, O., H. R. Bourne, and F. E. Cohen. 1996. An
evolutionary trace method defines binding surfaces common
to protein families. J. Mol. Biol. 257:342–358.
Lichtarge, O., and M. E. Sowa. 2002. Evolutionary predictions of
binding surfaces and interactions. Curr. Opin. Struct. Biol.
12:21–27.
Lopez, P., D. Casane, and H. Philippe. 2002. Heterotachy, an
important process of protein evolution. Mol. Biol. Evol. 19:
1–7.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree
of life in the light of the covarion model. J. Mol. Evol.
49:496–508.
Marcotte, E. M., M. Pellegrini, H. L. Ng, D. W. Rice, T. O.
Yeates, and D. Eisenberg. 1999. Detecting protein function
and protein-protein interactions from genome sequences.
Science 285:751–753.
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013
homolog histidine b146, both of which presented no
substitutions over the whole mammalian tree. These sites
are involved in crucial interactions with different residues
in the deoxy state. The high proportion of CBD positions
at inter-subunit surfaces supports their role as potential
indicators of functional divergence, because the refinement
of interactions at these interfaces played a fundamental
role in the evolution of critical functions such as
modulation of oxygen affinity and cooperative binding
(Perutz 1970). It will be interesting to verify whether CBD
sites have the same critical role in proteins that do not
oligomerize, when their representation in public databases
will be sufficient to allow an analysis similar to that
presented here. Only three CBD sites occupied external
locations on human hemoglobin (fig. 2b; see also table S2
in the Supplementary Material online). Because the reason
for their strong conservation on the heterotetramer surface
is not obvious, and because they are probably involved in
functional divergence, they might represent promising
candidates for further experimental studies.
In conclusion, our study underlines the power of
integrating evolutionary analyses to structural data in
functional prediction studies. In detail, we indicate CBD
positions as more reliable markers of functional specialization than heterotachous sites. Although heterotachy
remains a phenomenon worthy of further investigation
for understanding the dynamics of protein evolution, the
study of CBD sites may represent a novel and promising
direction for genomic studies aimed at dissecting the
function of members of large multigene families.
Functional Divergence Prediction 1759
Naylor, G. J., and M. Gerstein. 2000. Measuring shifts in function
and evolutionary opportunity using variability profiles: a case
study of the globins. J. Mol. Evol. 51:223–233.
Ohno, S., U. Wolf, and N. B. Atkin. 1968. Evolution from fish to
mammals by gene duplication. Hereditas 59:169–187.
Pellegrini, M., E. M. Marcotte, M. J. Thompson, D. Eisenberg,
and T. O. Yeates. 1999. Assigning protein functions by
comparative genome analysis: protein phylogenetic profiles.
Proc. Natl. Acad. Sci. USA 96:4285–4288.
Perutz, M. F. 1970. Stereochemistry of cooperative effects in
haemoglobin. Nature 228:726–739.
Philippe, H. 1993. MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res. 21:
5264–5272.
Philippe, H., and P. Lopez. 2001. On the conservation of
protein sequences in evolution. Trends Biochem. Sci. 26:
414–416.
Podda, A., R. Galanello, L. Maccioni, M. A. Melis, C. Rosatelli,
L. Perseu, and A. Cao. 1991. Hemoglobin Cagliari (beta 60
[E4] Val—-Glu): a novel unstable thalassemic hemoglobinopathy. Blood 77:371–375.
Sayle, R. A., and E. J. Milner-White. 1995. RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20:374–376.
Shionyu, M., K. Takahashi, and M. Go. 2001. Variable subunit
contact and cooperativity of hemoglobins. J. Mol. Evol.
53:416–429.
Tame, J. R., and B. Vallone. 2000. The structures of deoxy
human haemoglobin and the mutant Hb Tyralpha42His at 120
K. Acta Crystallogr. D. Biol. Crystallogr. 56:805–811.
Yang, Z. 1997. Phylogenetic Analysis by Maximum Likelihood
(PAML), Version 1.3. Department of Integrative Biology,
University of California at Berkeley,
———. 2002. Inference of selection from multiple species
alignments. Curr. Opin. Genet. Dev. 12:688–694.
Brian Golding, Associate Editor
Accepted May 14, 2003
Downloaded from http://mbe.oxfordjournals.org/ by guest on September 7, 2013