Evolution of protein fold in the presence of functional constraints

Evolution of protein fold in the presence of functional
constraints
Antonina Andreeva and Alexey G Murzin
The functional requirement to form and maintain the active site
structure probably exerts a strong selective pressure on a
protein to adopt just one stable and evolutionarily conserved
fold. Nonetheless, new evidence suggests the likelihood of
protein fold being neither physically nor biologically invariant.
Alternative folds discovered in several proteins are composed
of constant and variable parts. The latter display contextdependent conformations and a tendency to form new
oligomeric interfaces. In turn, oligomerisation mediates fold
evolution without loss of protein function. Gene duplication
breaks down homo-oligomeric symmetry and relieves the
pressure to maintain the local architecture of redundant active
sites; this can lead to further structural changes.
scenarios [2–4], as evolutionary considerations thus far
played little if any role in the selection of proteins for
structure determination. The combination of structural
genomics and targeted structural studies promises to fill
the gaps in this knowledge. It has already made an impact
on the growth rate and composition of structural data, with
a significant increase in the number of protein families of
known structure [5–7]. For this review, we have elected
to discuss those families whose non-trivial structural relationships have enabled new advances in our understanding
of how a protein fold can change without compromising
the integrity of the functional site structure.
Shifting paradigm of protein fold
Addresses
MRC Centre for Protein Engineering, Hills Road, Cambridge CB2
2QH, UK
Corresponding author: Murzin, Alexey G ([email protected])
Current Opinion in Structural Biology 2006, 16:399–408
This review comes from a themed issue on
Sequences and topology
Edited by Nick V Grishin and Sarah A Teichmann
Available online 2nd May 2006
0959-440X/$ – see front matter
# 2006 Elsevier Ltd. All rights reserved.
DOI 10.1016/j.sbi.2006.04.003
Introduction
‘‘No new principle will declare itself from below a heap of facts’’
Peter Medawar
An understanding of natural history at the molecular level
derives from knowledge of probable evolutionary relationships, as deduced from sequence, structural and functional similarities among known genes and their products.
There is a functional requirement to form and maintain
the definite active site structure that probably exerts a
strong selective pressure on a protein to adopt just one
stable and conserved fold. Therefore, protein structures
are generally regarded as ‘fossil records’ of molecular
evolution. Systematic analysis of accumulated structural
data provided many insights into the evolution of protein
structure and function [1]. Yet our knowledge of the field
remained a collection of miscellaneous ‘facts of protein
fold evolution’ that could fit very different theoretical
www.sciencedirect.com
A protein fold is a simplified representation of protein
structure that was originally intended to be invariant to
possible conformational changes. The potential flexibility
of protein loops and other peripheral regions, and their
structural variability in homologous proteins were recognised a long time ago. Therefore, the fold of a protein was
defined by the composition, architecture and topology of
its core secondary structure elements (i.e. a helices and/or
b strands). The discovery of chameleon sequences that
can adopt alternative secondary structures in the same
protein, thus affecting its composition, has already shaken
the presumed invariance of protein fold [8,9]. Recent
structures revealed even more remarkable examples of
large-scale fold variations, altering the protein architecture and topology. These examples provide new insights
into protein fold evolution and may be of value in the
quest for an evolving paradigm of changeable fold.
Chameleon sequences and alternative folds
The spindle assembly checkpoint protein Mad2 is one of
the best-studied proteins shown to adopt two distinct
folded conformations in native conditions. Previously,
they were thought to represent two different functional
states: one is the free protein, whereas the other is the
ligand-bound form [10]. Recently, both conformations
were shown to be at equilibrium in ligand-free Mad2
[11]. Comparison of the two alternative conformations
revealed a compact common core comprising about 70%
of the Mad2 protein chain. The rest of the chain undergoes a major conformational change involving the refolding and translocation of the C-terminal b-structure from
one end of the main b-sheet to the other end. In addition,
there are flexible regions, the sequence locations of which
differ between the conformations. The conformational
flexibility of the Mad2 C terminus appears to have a
functional role. Its hinged mobile elements wrap around
the elongated ligands like molecular ‘safety belts’ [12].
Current Opinion in Structural Biology 2006, 16:399–408
400 Sequences and topology
A new chameleon motif has been found in the AXH
domain, a common region of the ataxin-1 and HBP1
proteins implicated in binding RNA. The structures of
the AXH domains of both proteins have been published
recently [13,14]. The ataxin-1 domain forms a dimer in
the crystals, with the two subunits adopting similar but
non-identical conformations. The most different parts are
the N-terminal tails, which are found at the subunit
interface and interact complementarily along the
pseudo-twofold axis. This observation of a dimeric interface formed by two mutually adapting yet different
chameleon motifs is novel. It is thought that this structural adaptability is essential in maintaining the AXH
dimer [13]. The solution structure of the monomeric
HBP1 AXH domain has revealed an even larger conformational variability in the N-terminal region, compared
with the ataxin-1 domain structures. It retains similar
secondary structure, but otherwise adopts a very different
conformation and occupies a different position relative
the C-terminal common core. It should be noted here that
the relationship between the monomeric HBP1 and
dimeric ataxin-1 AXH domains is unlike the domainswapping relationships discussed below.
Structural comparison of the cyanobacterial circadian
clock proteins SasA and KaiB underlines a possible role
for chameleon sequences in protein fold evolution. SasA
is a histidine kinase that contains an additional N-terminal domain, N-SasA, whose sequence is very similar to
the entire KaiB sequence (Figure 1). Both N-SasA and
KaiB interact with the KaiC component of the same
molecular clock. The isolated N-SasA domain is a monomer in solution and adopts a thioredoxin-like fold [15].
Surprisingly, despite its significant similarity to N-SasA
and detectable similarity to other thioredoxin-like structures, KaiB adopts a quite different fold [16,17]. There is
a good correlation between sequence and structure for the
N-terminal halves of both proteins (which comprise a
common bab unit), whereas the C-terminal portions of
the KaiB and N-SasA folds differ substantially in their
secondary structure and topology. Instead of the C-terminal bba motif of N-SasA, KaiB has an aab motif
(Figure 1). In the KaiB crystals, the C-terminal motif
mediates the formation of a tetramer thought to be the
KaiB biological unit [17,18]. The apparent homology
between N-SasA and KaiB, and their probable remote
homology to thioredoxins suggest the hypothesis that
their common ancestor is a thioredoxin-like protein that
evolved a chameleon sequence. Several critical amino
acid substitutions in the KaiB lineage probably tipped the
balance toward the formation of different local structures
and facilitated the acquisition of a new protein function.
Variable structural parts and oligomerisation
The above examples show that the entire protein fold,
including every secondary structure element, is not
necessarily invariant, but there may be a constant part.
The remaining structure may vary depending on both the
constant part and the external conditions (binding
ligands, oligomerisation state and so on). The involvement of variable structures in the oligomerisation of the
AXH domain, KaiB and, probably, Mad2 may reflect a
more general phenomenon. A remarkable yet overlooked
example of this phenomenon was revealed by crystal
structures of the oxygen-dependent coproporphyrinogen
oxidase (CPO) from Saccharomyces cerevisiae (Hem13p)
[19]. Its protein fold features a large central antiparallel
sheet that is flanked by helices (Figure 2). The
Figure 1
Comparison of (a) the sequences and (b) the structures of the SasA N-terminal domain (left, PDB code 1t4y) and the KaiB subunit (right,
PDB code 1r5p). FASTA alignment of the two proteins is shown with identical residues highlighted and structurally equivalent segments boxed.
Structures are rainbow coloured (blue!red) from the N to C termini, so aligned segments of the two proteins are the same colour. The structural
cartoons in all figures were produced with PyMOL (http://www.pymol.org) [48].
Current Opinion in Structural Biology 2006, 16:399–408
www.sciencedirect.com
Evolution of protein fold in the presence of functional constraints Andreeva and Murzin 401
Figure 2
Two different dimerisation modes of yeast CPO. (a) Probable biological unit in the closed form and (b) the self-inhibited ‘tight’ dimer. One
rainbow-coloured subunit is shown in approximately the same orientation in both dimers.
C-terminal, mainly helical segment contributes to the
active site and forms the dimerisation interface, which
is observed in two different crystal forms capturing the
open and closed conformations of the active site cleft. A
third crystal form contains a very different dimer. More
than 30% of the subunit structure, including almost the
entire C-terminal segment, is disordered and displaced to
create an alternative, more intimate dimerisation interface. A very similar dimeric structure has been observed
independently in a similar but distinct crystal form (PDB
code 1txn), suggesting that this dimer, which incidentally
produces better-diffracting crystals, is not a crystallisation
artefact. This allows the hypothesis that CPO from other
organisms may also form similar dimers, suggesting an
alternative explanation for the deleterious effect of many
of the mutations associated with the disease hereditary
coproporphyria [19,20].
of the CspA original structure, whereas the S1 secondary
structures are rearranged to complete the b-barrel structure (Figure 3). Thus, the structure of the CspA segment
can be seen as a template around which the rest of the
protein folds. Besides the completion of the barrel fold,
the S1 segment forms a compact structure composed of
four identical chains interlocking around a new hydrophobic core in the centre of the tetramer.
Figure 3
Non-homologous recombination
The proposed division of protein fold into constant and
variable parts, and the probable role of the latter in the
formation of new oligomeric interfaces gain further support from the structure of the chimeric 1B11 protein
obtained by non-homologous recombination [21,22].
This protein is composed of two subdomain-size segments, one from the cold shock protein CspA and the
other from ribosomal protein S1 (Figure 3). Incidentally,
these two proteins are distantly related, sharing the
nucleic-acid-binding OB-fold. Remarkably, the CspA
and S1 segments form practically equivalent rather than
complementary parts of this common fold, but there is no
fold similarity between the ‘parent’ fragments in the 1B11
structure, which comprises a tetramer made of two segment-swapped dimers. The six-stranded b-barrel of the
1B11 compact core resembles the ‘parent’ five-stranded
OB-fold barrel. The structure of the CspA segment is
retained and superimposes well on the equivalent region
www.sciencedirect.com
Structures of the combinatorial 1B11 protein and its ‘parents’. (a) Cold
shock protein CspA and (b) a domain of the ribosomal S1 protein
share the common OB-fold. (c) One 1B11 subunit is composed of
the N-terminal fragment of CspA (yellow) and the equivalent region of
the S1 domain (blue); in the other subunit of the segment-swapped
dimer, these regions are shown in orange and cyan, respectively.
The second swapped dimer that completes the 1B11 tetramer is not
shown.
Current Opinion in Structural Biology 2006, 16:399–408
402 Sequences and topology
The successful in vitro demonstration of the non-homologous recombination of subdomain-size segments to
yield a stable protein of novel fold has enormous implications for our understanding of protein evolution. In theory, such recombination can happen readily on the DNA
level, but it seemed unlikely to result in a viable protein.
The ability of one segment to act as a template for the
folding of the rest of the resulting polypeptide would
increase the likelihood of successful non-homologous
recombination in vivo, particularly if this segment contains all essential functional determinants of a parent
protein. In principle, many proteins that contain subdomains with similar structures and functions, such as the
DNA-binding HTH motif or RNA-binding KH and S4
(or aL) motifs, but otherwise display different protein
folds may have evolved by non-homologous recombination of these motifs with other protein segments. However, it remains to be demonstrated that non-homologous
recombination involving an intact subdomain can yield a
functional protein. The recombinant 1B11 protein probably does not retain the nucleic-acid-binding activity of
the parent proteins, as the putative RNA-binding site of
CspA is buried in the tetramer.
Evolution of structural complexity
The remarkable ‘spontaneous’ oligomerisation of 1B11
appears to be a common feature of other combinatorial
proteins selected by proteolytic stability [22]. This challenges the conventional paradigm of the evolution of
oligomeric complexes from monomeric proteins [23].
Indeed, recent evidence suggests that the first oligomers
were produced before genetics, as we know it, was at work.
Oligomeric biological units
There are examples of oligomeric proteins whose structures almost certainly predated their functions. Such
oligomers are composed of interlocking monomers that
are not compact by themselves, and cannot be rearranged
into compact and contiguous units by exchanging the
equivalent segments between different monomers. The
unanticipated possibility of such interlocked oligomers
presents a certain problem for their structure determination in solution; this has already resulted in a few known
and probable structural artefacts [24,25]. There are other
proteins whose oligomeric structures most likely preceded their functions. Composed of more compact monomers, they contain multiple equivalent active sites at the
subunit interfaces. For example, in tetrameric flavindependent thymidylate synthase (ThyX), each of its four
equivalent active sites is formed by three different subunits [26,27]. As each subunit contributes to three different active sites, its functionally essential residues are
distributed over a large area of the subunit surface and
therefore are very unlikely to have evolved in a functionless ThyX monomer.
These ‘ancient’ oligomeric biological units are common
and frequently observed in new protein structures.
Importantly, in these oligomers, segment swapping and
gene duplication can occur without changing the overall
and active site architectures, as illustrated by three structures from the AhpD-like superfamily (Figure 4). The
alkylhydroperoxidase AhpD is a metabolic enzyme
linked to antioxidant defence in mycobacteria [28,29].
It has a thioredoxin-like active site, but an unrelated all-a
fold. AhpD monomers are composed of two structural
repeats and assemble into a symmetrical homotrimer.
The AhpD oligomeric architecture is strikingly similar
to the hexameric architecture of TM1620 protein from
the carboxymuconolactone decarboxylase (CMD) family
(PDB codes 1p8c, 1vke), which also shows good sequence
and structural similarity to AhpD in the active site region.
Figure 4
Segment swapping and gene duplication in the oligomeric biological unit. The architectures of the CMD family hexameric proteins (a) TM1620
and (b) TTHA0727, and (c) trimeric alkylhydroperoxidase AhpD. Helices that are common to all three proteins are shown in colour and grey; other
helices are white. The structural repeats of one AhpD subunit are shown in cyan and blue, the corresponding TTHA0727 subunits are in yellow
and green, and their TM1620 counterparts are in violet and magenta. The ‘swapped’ helical segment in a third TM1620 subunit is shown in purple.
Current Opinion in Structural Biology 2006, 16:399–408
www.sciencedirect.com
Evolution of protein fold in the presence of functional constraints Andreeva and Murzin 403
The TM1620 subunit adopts a segment-swapped variant
of the fold of one AhpD repeat. Surprisingly, the pair of
TM1620 monomers that swap helices is not the same as
the pair that corresponds to one AhpD monomer. More
recently, the structure of another member of the CMD
family, TTHA0727 [30], provided a halfway house on the
route between the TM1620 and AhpD structures.
TTHA0727 is a hexamer, like TM1620, but its subunit
fold resembles one AhpD repeat without any segment
swapping.
Symmetry and duplication
Homo-oligomeric biological units usually contain multiple equivalent active sites that can be of very complex
architecture. With one active site per monomer, they
generally have higher ratios of active sites per residue
than monomeric enzymes. This organisation may prove
advantageous, provided there is a selective pressure to
limit the amount of DNA in the cell. On the other hand,
the oligomer symmetry imposes constraints on evolutionary changes of its structure that can hinder further optimisation and expansion of its functions. The higher exact
symmetry of homo-oligomers can be and frequently is
broken by duplication, as seen in related structures with
the same or similar function. For example, the recent
structure of the TusBCD complex, a mediator of thiouridine modification of tRNA, revealed a heterohexamer of
homologous but non-identical subunits related to the
homohexameric YchN protein, which is implicated in
sulfur metabolism [31,32]. The original D3 symmetry
of the YchN hexamer is reduced in the TusBCD complex, which retains just one exact twofold symmetry axis.
Of the six equivalent putative active sites in YchN, only
two remain functional in TusBCD. The loss of redundant
sites releases additional surface area available for interaction with other components of the sulfur relay system.
Usually, gene duplication is followed by in-frame fusion,
resulting in a single-chain multidomain protein that
retains many structural features of the original oligomeric
unit. Generally, all but one active sites are lost after the
duplication and fusion events. In theory, new linkers
connecting the termini of former monomers into a single
chain can obscure some of the original sites, thus influencing the selection of the location of the remaining active
site, but there is also some evidence of random selection
[33]. The chorismate mutase domain of Escherichia coli
P-protein (EcCM) forms an interlocked homodimer that
contains two equivalent active sites at the subunit interface. The structure of the yeast chorismate mutase
(ScCM) subunit is similar to the dimeric assembly of
EcCM, suggesting a probable duplication and fusion
event in the chorismate mutase family. The structure
of secreted chorismate mutase from Mycobacterium tuberculosis (*MtCM) probably resulted from an independent
similar event. Both *MtCM and ScCM retain only one of
the two active sites of their probable EcCM-like precursors, but they lost different sites.
The loss of redundant active sites upon duplication
relieves the functional pressure to maintain their local
architecture and can lead to further structural changes.
These changes can be both subtle and extensive, as
evidenced by the structure of monomeric dUTPase from
Epstein–Barr virus and its relationship with the trimeric
dUTPase structure [34]. In the trimeric dUTPase, the
C-terminal tail of one subunit completes the active site at
the interface of the two other subunits (Figure 5). In the
monomeric dUTPase, there is virtually the same active
site between two globular domains, also completed by the
C-terminal tail. The two domains retain some similarity
to the subunit structure of the trimeric enzyme, but they
deviate from it in different ways and therefore are less
Figure 5
Transition from the oligomeric biological unit to a monomeric multidomain enzyme. Cartoons of (a) the trimeric dUTPase structure, showing
monomers in different colours, and (b) monomeric dUTPase in similar colours, highlighting the relationship between its ‘domains’ and subunits
of the trimeric enzyme. dUTP molecules (space-fill) indicate the location of the active sites in both proteins.
www.sciencedirect.com
Current Opinion in Structural Biology 2006, 16:399–408
404 Sequences and topology
similar to each other. There is a long but compact linker
between the second domain and the C-terminal tail, a
possible remnant of the third domain, which decayed
almost completely. Analogous structural changes are proposed for the evolution of dimeric all-a dUTPase from
the ancestral tetrameric NTP pyrophosphatases of the
MazG-like superfamily [35].
Transient oligomers
There are many multidomain structures composed of
segment-swapped structural repeats, suggesting that they
may have evolved from monomeric single-domain proteins via transient oligomers. In theory, the segment
boundaries can be selected so that, in the swapped
oligomer, the active sites of the original monomers will
be combined in a larger symmetrical site. This new
symmetry subsequently can be utilised by evolution to
bind complex molecules that display the same symmetry
(or to stabilise the symmetrical transition states). Extant
oligomers with combined symmetrical active sites are
extremely rare. One recent example is the structure of
the putative syrohydrochlorin cobaltochelatase CbiX
(PDB code 1tjn), the swapped dimer of which bears
similarities to one monomer of the cobalt chelatase CbiK
[36]. Evidence supporting the transient existence of such
oligomers is suggested by a newly discovered relationship
between the sirtuin (Sir2) family of deacetylases, which
catalyze NAD-dependent deacetylation of modified
lysine residues in histones and other proteins [37], and
the molybdenum-cofactor-containing enzymes of the
DMSO reductase/formate dehydrogenase family [38].
The molybdenum cofactor (Mo-co) consists of two molecules of molybdopterin guanosine dinucleotide (MGD)
[38] and binds at the interface of two structurally similar
domains. The MGD-binding domains are related by a
circular permutation and arranged about the pseudo-twofold symmetry axis, coinciding with the twofold symmetry axis of Mo-co (Figure 6). They show previously
unreported structural similarities to the sirtuin NADbinding domain, extending to the architectures of the
cofactor-binding sites and the modes of recognition of the
GDP moieties of MGD and the ADP moiety of NAD
(A Andreeva, unpublished). This suggests that these
dinucleotide-binding domains probably have evolved
from a common ancestor. A putative evolutionary pathway from an ancestral sirtuin-domain-like monomer to
the molybdenum-cofactor-binding domain includes the
formation of a segment-swapped dimer followed by a
gene duplication and fusion event.
There are many other proteins of analogous domain
architectures that bind no symmetrical ligands in their
Figure 6
Structural and functional relationship between the cofactor-binding domains of the DMSO reductase/formate dehydrogenase and sirtuin
families. (a) Structure of the Mo-co domain of dissimilatory nitrate reductase (PDB code 2nap). The sphere represents the molybdenum atom,
which is bound between two MGD molecules, shown in stick representation. (b) Structure of archaeal sirtuin AF1676 with bound NAD (PDB
code 1ici). Two sets of structurally similar segments are coloured in similar hues: one in cyan and light and dark blue, the other in yellow and
light and dark orange. (c) Schematic showing the sequential order of these segments in both structures (top, 2nap; bottom, 1ici). (d) Superimposition
of the cofactor-binding sites of the MGD-binding domains (cyan and yellow) and NAD-binding domain (pink).
Current Opinion in Structural Biology 2006, 16:399–408
www.sciencedirect.com
Evolution of protein fold in the presence of functional constraints Andreeva and Murzin 405
‘combined’ active sites. These proteins also may have
evolved from pre-existing functional monomers by a
similar mechanism, using duplication and fusion events
as a necessary step to break down the redundant symmetry of the hypothetical transient oligomers.
Evolution of structure in 4D
We have discussed the possible changes of protein structure, caused by domain (segment) swapping, duplication,
deletion (of the redundant active sites and supporting
structures) and decoration (with additional structures),
that can occur without affecting the integrity of the
remaining active site(s). A series of such ‘D-events’ in
the evolution of a protein family may produce members
with very dissimilar folds. Yet their evolutionary relationship can be traced step-by-step through extant
intermediate structures if available. We illustrate this
by a probable scenario for fold evolution in the phosphogluconate dehydrogenase (PGDH)-like suprafamily of
oxidoreductases (Figure 7). Members of this family contain two different structural domains: an N-terminal a/b
domain and a C-terminal all-a domain. There is a familyspecific extension of the b-sheet of the N-terminal
domain that distinguishes it from the related NAD(P)binding Rossmann fold domains of other oxidoreductase
families. In contrast, the C-terminal domain structures
often appear dissimilar, most notably in the larger structures of the founding member PGDH [39], class II
ketoacyl reductoisomerase (KARI) [40] and mannitol
dehydrogenase [41]. The PGDH C-terminal domain
consists of two structural repeats packed side-by-side.
The class II KARI C-terminal domain contains two
Figure 7
Structural evolution of the PGDH-like oxidoreductases. The conserved N-terminal domains are shown in blue in all structures. In the top row,
dimeric structures of UDPGDH (left) and GDPMDH (middle) are shown with the dimerisation domains of different monomers in green and
pink; the extra C-terminal domains are removed for clarity. The structure of a GPD monomer (right); the colouring of its C-terminal domain
highlights its similarity to a compact half of the GDPMDH dimer. In the bottom row, the subunit structures of PGDH (left) and class II KARI (middle)
are shown; the internal structural repeats that correspond to parts of different subunits of the above dimers are shown in the same colours.
The structure of mannitol dehydrogenase (right) is coloured by similarity to GPD, with additional helices shown in grey. Major D-events (see text)
are shown with block arrows. Dup/Del designates a series of gene duplication and fusion events, followed by deletions of redundant domains.
www.sciencedirect.com
Current Opinion in Structural Biology 2006, 16:399–408
406 Sequences and topology
repeats of a different fold, intertwined into a ‘figure eight’
knot [42]. The C-terminal domain of mannitol dehydrogenase shows limited similarity to the PGDH and class II
KARI C-terminal domains, and contains no structural
repeats or unusual topological features.
In several other family members, the C-terminal domain
provides the dimerisation interface and contributes to the
active site formed by the coenzyme-binding domain of
the symmetry-related subunit. The structures of these
members can be organised in two different groups related
by domain swapping within the dimeric biological unit:
the UDP-glucose dehydrogenase (UDPGDH) group [43]
and the GDP-mannose dehydrogenase (GDPMDH)
group [44]. The ‘archetypal’ members of these groups
(i.e. UDPGDH and GDPMDH) are closely related. They
have similar sequences and extra C-terminal domains.
The UDPGDH group includes hydroxyisobutirate dehydrogenase [45], which has extensive sequence similarity
to PGDH. The structure of one PGDH subunit closely
resembles the structure of the hydroxyisobutirate dehydrogenase dimer without one coenzyme-binding domain
and most probably evolved from the hydroxyisobutiratedehydrogenase-like precursor by gene duplication and
domain deletion events. Another series of gene duplication and domain deletion events relates the subunit
structure of the class II KARI to the structure of the class
I KARI dimer [46] (from the GDPMDH group), decorated with additional helices. Finally, the GDPMDH
group has a dimer-to-monomer swapping relationship
with a third group of family members, represented by
glycerol-3-phosphate dehydrogenase (GPD) [47]. One
half of the GDPMDH dimeric fold, composed of complementary parts of different subunits enclosing one
active site, is similar to the GPD subunit fold. The
mannitol dehydrogenase structure belongs to the GPD
group and its common fold is further decorated with
additional secondary structures.
What lies ahead
The selection of newly observed structural relationships
for discussion in this review benefited from an apparent
redundancy of structural genomics efforts aimed at the
determination of a representative structure for each protein family. So far, competition between structural genomic centres and independent structural biology groups
has resulted in more than one structure having been
determined independently for almost every structurally
characterised family and, in a few cases, for the same
protein. Hopefully, this promising trend will persist. As
structural data continue to grow, one can expect to find
more and more examples of protein families that display
significant fold variations. There is the possibility of
‘accidental’ discoveries of unknown proteins with alternative stable folds. There is an even more intriguing
possibility of finding alternative folds in known proteins.
Such discoveries would help to trigger systematic
Current Opinion in Structural Biology 2006, 16:399–408
research into the (in)variability of already known protein
folds.
References and recommended reading
Papers of particular interest, published within the annual period of
review, have been highlighted as:
of special interest
of outstanding interest
1.
Murzin AG: How far divergent evolution goes in proteins.
Curr Opin Struct Biol 1998, 8:380-387.
2.
Grishin NV: Fold change in evolution of protein structures.
J Struct Biol 2001, 134:167-185.
3.
James LC, Tawfik DS: Conformational diversity and protein
evolution – a 60-year-old hypothesis revisited. Trends Biochem
Sci 2003, 28:361-368.
4.
Friedberg I, Godzik A: Connecting the protein structure
universe by using sparse recurring fragments. Structure 2005,
13:1213-1224.
5. Chandonia JM, Brenner SE: The impact of structural genomics:
expectations and outcomes. Science 2006, 311:347-351.
An insider review of the completion of phase one of the Protein Structure
Initiative.
6. Wlodawer A: Giving credit where credit is due. Nat Struct Mol
Biol 2005, 12:634.
By citing this correspondence [6,7], we wish to express our support for
the proposal to assign a document object identifier (DOI) to each PDB
entry, so that structures deposited in the PDB can be referenced and
accessed in the same manner as other electronic publications. Meanwhile, we list here the PDB identifiers of all currently unpublished structures discussed in this review: 1txn, 1p8c, 1vke, 1tjn.
7. Berman HM: Giving credit where credit is due – reply. Nat Struct
Mol Biol 2005, 12:634.
See annotation to [6].
8.
Minor DL Jr, Kim PS: Context-dependent secondary structure
formation of a designed protein sequence. Nature 1996,
380:730-734.
9.
Tidow H, Lauber T, Vitzithum K, Sommerhoff CP, Rosch P,
Marx UC: The solution structure of a chimeric LEKTI
domain reveals a chameleon sequence. Biochemistry 2004,
43:11238-11247.
10. Luo X, Tang Z, Rizo J, Yu H: The Mad2 spindle checkpoint
protein undergoes similar major conformational changes
upon binding to either Mad1 or Cdc20. Mol Cell 2002,
9:59-71.
11. Luo X, Tang Z, Xia G, Wassmann K, Matsumoto T, Rizo J, Yu H:
The Mad2 spindle checkpoint protein has two distinct natively
folded states. Nat Struct Mol Biol 2004, 11:338-345.
Mad2 is a relatively small, single-domain protein that shows some prionlike properties. A transiently formed heterodimer of the N1-Mad2 and N2Mad2 conformers is converted into the wild-type N2-Mad2 homodimer.
The N2-Mad2 solution structure has been determined using a monomeric
mutant.
12. Sironi L, Mapelli M, Knapp S, De Antoni A, Jeang KT, Musacchio A:
Crystal structure of the tetrameric Mad1-Mad2 core complex:
implications of a ‘safety belt’ binding mechanism for the
spindle checkpoint. EMBO J 2002, 21:2496-2506.
13. Chen YW, Allen MD, Veprintsev DB, Löwe J, Bycroft M: The
structure of the AXH domain of spinocerebellar ataxin-1.
J Biol Chem 2004, 279:3758-3765.
See annotation to [14].
14. de Chiara C, Menon RP, Adinolfi S, de Boer J, Ktistaki E,
Kelly G, Calder L, Kioussis D, Pastore A: The AXH domain
adopts alternative folds: the solution structure of HBP1 AXH.
Structure 2005, 13:743-753.
One of many examples of the parallel determination of the same or closely
related target structures by independent groups [13]. The discovery of
unexpected structural diversity of this domain is a bonus.
www.sciencedirect.com
Evolution of protein fold in the presence of functional constraints Andreeva and Murzin 407
15. Vakonakis I, Klewer DA, Williams SB, Golden SS, LiWang AC:
Structure of the N-terminal domain of the circadian
clock-associated histidine kinase SasA. J Mol Biol 2004,
342:9-17.
16. Garces RG, Wu N, Gillon W, Pai EF: Anabaena circadian clock
proteins KaiA and KaiB reveal a potential common binding site
to their partner KaiC. EMBO J 2004, 23:1688-1698.
17. Hitomi K, Oyama T, Han S, Arvai AS, Getzoff ED: Tetrameric
architecture of the circadian clock protein KaiB. A novel
interface for intermolecular interactions and its impact on the
circadian rhythm. J Biol Chem 2005, 280:19127-19135.
18. Iwase R, Imada K, Hayashi F, Uzumaki T, Morishita M,
Onai K, Furukawa Y, Namba K, Ishiura M: Functionally
important substructures of circadian clock protein KaiB
in a unique tetramer complex. J Biol Chem 2005,
280:43141-43149.
19. Phillips JD, Whitby FG, Warby CA, Labbe P, Yang C, Pflugrath JW,
Ferrara JD, Robinson H, Kushner JP, Hill CP: Crystal structure of
the oxygen-dependant coproporphyrinogen oxidase
(Hem13p) of Saccharomyces cerevisiae. J Biol Chem 2004,
279:38960-38968.
20. Lee DS, Flachsova E, Bodnarova M, Demeler B, Martasek P,
Raman CS: Structural basis of hereditary coproporphyria.
Proc Natl Acad Sci USA 2005, 102:14232-14237.
21. de Bono S, Riechmann L, Girard E, Williams RL, Winter G:
A segment of cold shock protein directs the folding of a
combinatorial protein. Proc Natl Acad Sci USA 2005,
102:1396-1401.
This paper reports the first structure of an artificial protein from the set
of novel folded domains generated by random shuffling of nonhomologous polypeptide segments and discusses its evolutionary
implications.
22. Riechmann L, Lavenir I, de Bono S, Winter G: Folding and stability
of a primitive protein. J Mol Biol 2005, 348:1261-1272.
The authors report the biophysical characterisation of the 1B11 combinatorial protein, the structure of which is described in [21]. They confirm
that segment swapping and associated oligomerisation are both powerful
ways of stabilising proteins, supporting the view that this may have been a
feature of early protein evolution.
target for antitubercular drug design. J Biol Chem 2002,
277:20033-20040.
30. Ito K, Arai R, Fusatomi E, Kamo-Uchikubo T, Kawaguchi S-I,
Akasaka R, Terada T, Kuramitsu S, Shirouzu M, Yokoyama S:
Crystal structure of the conserved protein TTHA0727 from
Thermus thermophilus HB8 at 1.9 Å resolution: a CMD
family member distinct from carboxymuconolactone
decarboxylase (CMD) and AhpD. Protein Sci 2006;
doi:10.1110/ps.062148506.
31. Numata T, Fukai S, Ikeuchi Y, Suzuki T, Nureki O: Structural basis
for sulfur relay to RNA mediated by heterohexameric TusBCD
complex. Structure 2006, 14:357-366.
Recent genetic studies reveal that the products of five novel genes,
tusABCDE, function in 2-thiouridine modification of tRNA wobble positions. The TusBCD complex is a dimer of heterotrimers of homologous
subunits, related to hypothetical protein YchN [32], of which only TusD
retains the catalytic cysteine residue.
32. Shin DH, Yokota H, Kim R, Kim SH: Crystal structure of a
conserved hypothetical protein from Escherichia coli.
J Struct Funct Genomics 2002, 2:53-66.
33. Ökvist M, Dey R, Sasso S, Grahn E, Kast P, Krengel U: 1.6 Å
crystal structure of the secreted chorismate mutase from
Mycobacterium tuberculosis: novel fold topology revealed.
J Mol Biol 2006, 357:1483-1499.
Structures of members of three chorismate mutase AroQ subclasses
point to divergent evolution in the distant past. The AroQb and AroQg
subclasses probably evolved from an ancestor of the much simpler AroQa
subclass by gene duplication and fusion events to generate different fold
variants.
34. Tarbouriech N, Buisson M, Seigneurin J-M, Cusack S,
Burmeister WP: The monomeric dUTPase from Epstein-Barr
virus mimics trimeric dUTPases. Structure 2005, 13:
1299-1310.
The monomeric and trimeric dUTPases both contain the same five
characteristic sequence motifs, but in a different order. This example
of evolution from the trimeric to the monomeric enzyme is contrary to the
commonly observed trend for efficient genome usage in viruses, as the
monomeric dUTPase needs more coding sequence than a trimeric one.
23. D’Alessio G: The evolutionary transition from monomeric to
oligomeric proteins: tools, the environment, hypotheses.
Prog Biophys Mol Biol 1999, 72:271-298.
35. Moroz OV, Murzin AG, Makarova KS, Koonin EV, Wilson KS,
Galperin MY: Dimeric dUTPases, HisE, and MazG belong to
a new superfamily of all-a NTP pyrophosphohydrolases
with potential ‘house-cleaning’ functions. J Mol Biol 2005,
347:243-255.
24. Bobay BG, Andreeva A, Mueller GA, Cavanagh J, Murzin AG:
Revised structure of the AbrB N-terminal domain unifies a
diverse superfamily of putative DNA-binding proteins.
FEBS Lett 2005, 579:5669-5674.
36. Schubert HL, Raux E, Wilson KS, Warren MJ: Common chelatase
design in the branched tetrapyrrole pathways of heme
and anaerobic cobalamin synthesis. Biochemistry 1999,
38:10660-10669.
25. Coles M, Djuranovic S, Söding J, Frickey T, Koretke K, Truffault V,
Martin J, Lupas AN: AbrB-like transcription factors assume
a swapped hairpin fold that is evolutionarily related to
double-psi b-barrels. Structure 2005, 13:919-928.
37. Blander G, Guarente L: The Sir2 family of protein deacetylases.
Annu Rev Biochem 2004, 73:417-435.
26. Mathews II, Deacon AM, Canaves JM, McMullan D, Lesley SA,
Agarwalla S, Kuhn P: Functional analysis of substrate
and cofactor complex structures of a thymidylate
synthase-complementing protein. Structure 2003,
11:677-690.
27. Leduc D, Graziani S, Lipowski G, Marchand C, Le Marechal P,
Liebl U, Myllykallio H: Functional evidence for active site
location of tetrameric thymidylate synthase X at the
interphase of three monomers. Proc Natl Acad Sci USA 2004,
101:7252-7257.
Homo-oligomeric enzymes with active sites formed at the interface
of three or more monomers are rare. In this case, each monomer is
shown to contribute a catalytically essential residue, suggesting that the
ThyX tetramer may have evolved by oligomerisation of ‘inactive’ monomers.
28. Bryk R, Lima CD, Erdjument-Bromage H, Tempst P, Nathan C:
Metabolic enzymes of mycobacteria linked to antioxidant
defense by a thioredoxin-like protein. Science 2002,
295:1073-1077.
29. Nunn CM, Djordjevic S, Hillas PJ, Nishida CR: Ortiz de
Montellano PR. The crystal structure of Mycobacterium
tuberculosis alkylhydroperoxidase AhpD, a potential
www.sciencedirect.com
38. Kisker C, Schindelin H, Rees DC: Molybdenum-cofactorcontaining enzymes: structure and mechanism.
Annu Rev Biochem 1997, 66:233-267.
39. Adams MJ, Ellis GH, Gover S, Naylor CE, Phillips C:
Crystallographic study of coenzyme, coenzyme analogue and
substrate binding in 6-phosphogluconate dehydrogenase:
implications for NADP specificity and the enzyme mechanism.
Structure 1994, 2:651-668.
40. Biou V, Dumas R, Cohen-Addad C, Douce R, Job D,
Pebay-Peyroula E: The crystal structure of plant acetohydroxy
acid isomeroreductase complexed with NADPH, two
magnesium ions and a herbicidal transition state
analog determined at 1.65 Å resolution. EMBO J 1997,
16:3405-3415.
41. Kavanagh KL, Klimacek M, Nidetzky B, Wilson DK:
Crystal structure of Pseudomonas fluorescens mannitol
2-dehydrogenase binary and ternary complexes:
specificity and catalytic mechanism. J Biol Chem 2002,
277:43433-43442.
42. Campbell RE, Mosimann SC, van De Rijn I, Tanner ME,
Strynadka NC: The first structure of UDP-glucose
dehydrogenase reveals the catalytic residues necessary for
the two-fold oxidation. Biochemistry 2000, 39:7012-7023.
Current Opinion in Structural Biology 2006, 16:399–408
408 Sequences and topology
43. Taylor WR: A deeply knotted protein structure and how it might
fold. Nature 2000, 406:916-919.
44. Snook CF, Tipton PA, Beamer LJ: Crystal structure of
GDP-mannose dehydrogenase: a key enzyme of alginate
biosynthesis in P. aeruginosa. Biochemistry 2003,
42:4658-4668.
45. Lokanath NK, Ohshima N, Takio K, Shiromizu I, Kuroishi C,
Okazaki N, Kuramitsu S, Yokoyama S, Miyano M, Kunishima N:
Crystal structure of novel NADP-dependent
3-hydroxyisobutyrate dehydrogenase from Thermus
thermophilus HB8. J Mol Biol 2005, 352:905-917.
Current Opinion in Structural Biology 2006, 16:399–408
46. Ahn HJ, Eom SJ, Yoon HJ, Lee BI, Cho H, Suh SW:
Crystal structure of class I acetohydroxy acid
isomeroreductase from Pseudomonas aeruginosa.
J Mol Biol 2003, 328:505-515.
47. Suresh S, Turley S, Opperdoes FR, Michels PA, Hol WG: A
potential target enzyme for trypanocidal drugs revealed by the
crystal structure of NAD-dependent glycerol-3-phosphate
dehydrogenase from Leishmania mexicana. Structure 2000,
8:541-552.
48. DeLano WL: The PyMOL Molecular Graphics System. San Carlos,
CA, USA: DeLano Scientific; 2002.
www.sciencedirect.com