Analysis of Catalytic Residues in Enzyme Active Sites

doi:10.1016/S0022-2836(02)01036-7 available online at http://www.idealibrary.com on
w
B
J. Mol. Biol. (2002) 324, 105–121
Analysis of Catalytic Residues in Enzyme Active Sites
Gail J. Bartlett1,2, Craig T. Porter1,2, Neera Borkakoti3 and
Janet M. Thornton2*†
1
Department of Biochemistry
and Molecular Biology
University College London
Darwin Building, Gower Street
London WC1E 6BT, UK
2
European Bioinformatics
Institute, European Molecular
Biology Laboratory
Wellcome Trust Genome
Campus, Hinxton, Cambridge
CB10 1SD, UK
We present an analysis of the residues directly involved in catalysis in 178
enzyme active sites. Specific criteria were derived to define a catalytic
residue, and used to create a catalytic residue dataset, which was then
analysed in terms of properties including secondary structure, solvent
accessibility, flexibility, conservation, quaternary structure and function.
The results indicate the dominance of a small set of amino acid residues
in catalysis and give a picture of a general active site environment. It is
hoped that this information will provide a better understanding of the
molecular mechanisms involved in catalysis and a heuristic basis for
predicting catalytic residues in enzymes of unknown function.
q 2002 Elsevier Science Ltd. All rights reserved
3
Roche Discovery Welwyn
Broadwater Road
Welwyn Garden City, Herts
AL7 3AY, UK
*Corresponding author
Keywords: enzyme active site; catalysis; amino acid residue; enzyme
function
Introduction
Enzymes are probably the most studied biological molecules. They constitute nature’s toolkit
for making and breaking down molecules required
by cells in the course of growth, repair, maintenance and death. Virtually every biological
process requires an enzyme at some point.
Enzymes are capable of carrying out complex
transformations in aqueous solution, at biological
temperatures and pH, in a stereospecific and
regiospecific manner, a feat seldom achieved by
the best of organic chemists.1 Perhaps the most
well-known enzyme catalytic mechanism is that of
the serine proteases, which contain a Ser-His-Asp
Present address: N. Borkakoti, Medivir UK Ltd,
Peterhouse Technology Park, 100 Fulbourn Road,
Cambridge, UK.
† On secondment from the Department of
Biochemistry and Molecular Biology, University College
London, Darwin Building, Gower Street, London WC1E
6BT, UK and Department of Crystallography, Birkbeck
College, Malet Street, London WC1E 7HX, UK.
Abbreviations used: DOPS, diversity of position score;
EC, Enzyme Commission; NRDB, Non-Redundant
DataBase; PDB, Protein Data Bank.
E-mail address of the corresponding author:
[email protected]
triad.2,3 This triad has evolved more than once in
different structural folds.4 Knowledge and
improved understanding of the properties of
enzyme active sites and their assorted catalytic
mechanisms is vital for novel protein design and
predicting protein function from structure.
Crystallographic and NMR studies of enzymes
have shed light on the relationship between an
enzyme’s three-dimensional structure and the
chemical reaction it performs. However, from a
structure alone it is a challenging task to extrapolate a catalytic mechanism. Detailed biochemical
information about the enzyme can be used to
design substrate or transition state analogues,
which can then be bound into the enzyme for
structure determination. These can reveal binding
site locations and identify residues, which are
likely to take part in the chemical reaction. From
this, a catalytic mechanism can be proposed
and can be confirmed by other information,
for example, site-directed mutagenesis, kinetic
analyses and by extrapolation from homologues.
This analysis concentrates on the amino acid
residues directly involved in enzyme catalysis, as
revealed by structural studies. It builds on the
work of Zvelebil & Sternberg,5 who in 1988 performed a comparative analysis of catalytic residues
in just 17 enzymes. Since this work was published,
0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved
106
Analysis of Catalytic Residues
Table 1. Example of information extracted for each enzyme in the dataset: carboxypeptidase D
Attribute
Information
PDB code
Enzyme name
EC number
CATH classification
Reaction catalysed
Mechanism
Active site residues
1bcr
Carboxypeptidase D
3.4.16.6
3.40.50.1570 (a/b-class, 3-layer(aba) sandwich)
Protease catalysing C-terminal hydrolysis of protein, with preference for arginine and lysine residues
Ser-His-Asp catalytic triad (see Figure 1)
(1) A Ser146—nucleophile
(2) B His397—primer, activates serine nucleophile; acid/base—donates proton to leaving NH group;
activates water molecule for hydrolysis of covalent intermediate
(3) B Asp338—primer, ensures lone pair of electrons on His397 N12
(4) A Gly53-NH—transition state stabilisation—stabilises negative charge on tetrahedral intermediate
(5) A Tyr147-NH—transition state stabilisation—stabilises negative charge on tetrahedral intermediate
Homodimer
PDOC00122
Medline: 7727364
Bioactive form
Prosite entry
Reference
the number of enzyme structures in the PDB6 has
increased forty-fold†, and techniques for elucidating enzyme catalytic mechanisms have improved.
Therefore, it is appropriate to re-examine amino
acid residues involved in catalysis as well as their
properties and roles, on a wider scale. A major
problem is the complexity of the data, and the
difficulty of extracting the relevant information
from the literature. In addition, the need to cluster
proteins into related families to generate “good”
unbiased data is non-trivial. The following
properties of catalytic residues are examined:
frequency distribution of residue type, function,
secondary structure environment, solvent accessibility, flexibility, conservation, hydrogen bonding
and quaternary structure. It is hoped that these
data will improve our understanding of the generic
principles of catalysis. They provide structurebased sequence annotation, which can help
identify potential catalytic residues from structure,
and a test-bed for developing tools to predict
mechanism from structure. Such tools are the
basis for predicting the function of structures
produced by structural genomics initiatives.
Criteria and Analysis of
Catalytic Residues
Collection of dataset
A Protein Site Atlas of functional sites, including
literature-defined enzyme active sites, is currently
under construction (C.T.P. & J.M.T., unpublished
results). Starting from the EC system,7 for each EC
number (see legend to Figure 3), enzymes with
structures in the PDB were examined and where
possible, active site residues assigned. It must be
noted that these are not simply the contents of the
SITE records of the PDB files, but contain information manually extracted from the primary literature. In order to generate a non-homologous
† http://www.rcsb.org/pdb/holdings_table.html
dataset, the CATH classification8,9 of each enzyme
was examined, and any duplicates at the CATH
“H” level (defined by structure and sequence
comparisons as having a common ancestor) were
removed. A complexity here is that CATH classifies protein domains, whereas this analysis concentrates on whole enzymes, which may have one or
more domains. Domains with identical CATH
numbers are retained in the dataset if they form
part of the same enzyme (i.e. a tandem repeat) or
if an identical domain is shared between two
multi-domain proteins with different functions.
From this list, a set of 178 was taken to form the
dataset for our work. Each enzyme has an X-ray
crystal structure (resolutions vary between 1.5 Å
and 3.2 Å) in the PDB (except for two cases which
are NMR-derived structures; PDB ID: 1mek and
1adn), a well-defined active site and a mechanism
of action proposed in the literature, usually
corroborated by site-directed mutagenesis and
other data. The information gathered for the
enzyme carboxypeptidase D10 is shown in Table 1
and Figure 1 as an example. Primary literature
used to collate this dataset and residue assignments can be found at the website‡.
Definition of catalytic residues
Catalytic residues are not consistently defined in
the literature, therefore, the following rules were
adhered to for classifying active site residues as
catalytic.
1. Direct involvement in the catalytic mechanism—e.g. as a nucleophile.
2. Exerting an effect on another residue or water
molecule which is directly involved in the
catalytic mechanism which aids catalysis
(e.g. by electrostatic or acid –base action).
3. Stabilisation of a proposed transition-state
intermediate.
‡ http://www.ebi.ac.uk/thornton-srv/databases/
CATRES/index.html
107
Analysis of Catalytic Residues
Figure 1. The catalytic mechanism of carboxypeptidase D.10
4. Exerting an effect on a substrate or cofactor
which aids catalysis, e.g. by polarising a
bond which is to be broken. Includes steric
and electrostatic effects.
Residues that bind substrate, cofactor or metal are
not included, unless they also perform one of the
functions listed above.
Residue analysis
Solvent accessibility was calculated using
NACCESS, taking the biological molecule as
defined by either the literature or PQS,11 in the
presence and absence of ligands (either the true
substrate or a substrate analogue). Enzyme clefts
were derived using SURFNET.12
Temperature factors (B-factors) were taken from
the PDB file for each atom in a residue, and then
averaged over the whole residue. To exclude
variations between proteins, the B-factors for each
protein were then normalised over the whole protein giving a B-factor of between 0 and 1 for each
residue. NMR models were removed for this part
of the analysis. Homologous sequences from the
Non-Redundant DataBase (NRDB, a database of
protein sequences maintained by NCBI) were
identified for each enzyme using the iterative
profile search program PSI-BLAST,13 which was
allowed a maximum of 20 iterations to reach
convergence. The E-value threshold for inclusion
of new sequences was set conservatively at 10240,
in order to minimise profile drift but maximise the
detection of remote homologues. The profile was
108
Analysis of Catalytic Residues
Figure 2. The role of histidine in
the first step of serine protease
and
adenylosuccinate
lyase
reactions.10,38 (a) Carboxypeptidase
D, histidine primes serine residue
for nucleophilic attack on the substrate; (b) Adenylosuccinate lyase,
histidine residue directly deprotonates the substrate.
used as a multiple alignment to score catalytic
residue conservation, using the method of Valdar
& Thornton.14 This method uses amino acid
residue similarities inferred from a Dayhoff-like
mutation data matrix15 to assess the diversity of
amino acid residues at a given aligned position.
Only those enzymes whose alignment had a
diversity of position score (DOPS) score of greater
than 90 were included in this part of the analysis.
DOPS is a measure of the number of distinct permutations of residue scores, based on Shannon’s
entropy.16 The greater the DOPS, the more diverse
the alignment, providing a more discriminating
Table 2. Functional classification of catalytic residues
Catalytic function
Acid–base
Nucleophilea
Transition state
stabiliser
Activate water
Activate cofactor
Primer
Activate substrate
Radical
Modified
Description
Involved in proton abstraction, donation or
both, to or from a substrate, as a direct part
of the catalytic mechanism. Excludes residues which affect other residues or water
molecules in this manner
Forms a covalent intermediate with the
substrate via nucleophilic attack
Stabilises the transition state in some way
(e.g. by stabilising an oxyanion hole formed
during ester hydrolysis), lowering the
activation energy of the reaction
Alter the pKa of or deprotonate a water
molecule which is directly involved in the
reaction
Exerts a favourable effect on a cofactor
(could be metal or minor substrate such
as FAD) through various means (e.g. by
altering redox potential, or increasing
effective charge)
Exerts a favourable effect on another
residue directly involved in the catalytic
mechanism, e.g. by acting as an acid or base,
or through electrostatic effects
Exerts a favourable effect on the substrate
(e.g. by polarising a bond to be broken)
Forms a radical which is involved in the
catalytic reaction
Modified in some way in order to perform
catalysis during the reaction, e.g. carbamylated lysine residue
a
This differs from the classical organic chemistry definition
of a nucleophile, which is an electron pair donor. By this definition, bases would also be classified as nucleophiles, therefore
in this analysis, the definition of nucleophile applies to covalent
catalysis only.
conservation score. Catalytic residue hydrogen
bonding was investigated using HBPlus.17 The
secondary structure environment of catalytic
residues was analysed using PROMOTIF.18
Residue function
There are many complications in assigning the
function of a catalytic residue, due to the multistep nature of chemical reactions. One residue can
play more than one role and can be involved in
different steps of the reaction. Inevitably, catalytic
mechanisms can only be properly modelled by
quantum mechanical methods, and the “curlyarrow” diagrams are just a schematic representation. Standard organic chemistry terms for the
role a residue plays in a catalytic mechanism can
occasionally be ambivalent, for example, the role
of histidine in the first step of the serine protease
mechanism and in the adenylosuccinate lyase
mechanism is, chemically speaking, the same—
both residues are performing proton abstraction
(see Figure 2). However, the effect in adenylosuccinate lyase is on the substrate itself, while in
the serine proteases, the effect is on another protein
residue directly involved in the reaction. In the
classification proposed herein, the histidine residue
in the serine protease is described as priming
another residue in the first step, while in adenylosuccinate lyase, the histidine residue is described
as a base. In addition, some residues will achieve
the same function (e.g. lowering the pKa of another
catalytic residue) by different means (e.g. by direct
acid – base action, or by increasing the effective
charge in the locality), so it is more meaningful to
group these together. The identified catalytic residues have therefore been grouped into the classifications shown in Table 2. Some residues may
perform more than one function in a particular
reaction step, but the more functionally informative classification is chosen as the main classification. For instance, if during the course of a
reaction, a residue acts as a base and deprotonates
a substrate, and then the substrate goes on to perform nucleophilic attack on another substrate,
then the classification “activates substrate” is
chosen rather than “acid –base”. The groups can
be broadly classified into two classes—primary
and secondary. The primary groups, acid/base,
Analysis of Catalytic Residues
109
Figure 3. Structural and functional description of enzyme dataset. (a) EC wheel functional classification of dataset.
The EC classification7 assigns a four digit number to the reaction catalysed, where the first digit denotes the class of
reaction (green, oxidoreductases (EC 1.– . – .– ); red, transferases (EC 2.– . –. – ); yellow, hydrolases (EC 3. – .– .– ); blue,
lyases (4.– .– . –); orange, isomerases (EC 5.– . –. – ); pink, ligases (EC 6.– .– .– ). The second, third and fourth levels
classify type of bond or substrate acted upon, substrate/product specificities and cofactor dependency. The meaning
of the second, third and fourth levels is dependent on the primary level. See Todd et al.39 for more details. (b) EC
wheel functional classification of all known enzymes,40 colours as in Figure (a). (c) The CATH structural classification
of the dataset. The CATH classification assigns a four digit number to each protein domain according to its secondary
structure9 (red, mainly a; green, mainly b; yellow, a/b; blue, few secondary structures). See http://www.biochem.ucl.
ac.uk/bsm/cath_new/cath_info.html for more details.
nucleophile and transition state stabiliser, are at the
forefront of the chemical reaction the enzyme performs. Residues that activate substrate, water,
cofactor or prime another residue can be thought
of as secondary catalytic residues, important for
“setting up” the reaction.
Results
Description of dataset
There are 178 enzymes in the dataset, and 615
catalytic residues, giving each enzyme an average
of 3.5 catalytic residues. A functional description
of the dataset is given by the EC wheel (see Figure
3(a)). The EC wheel is a visual representation of
all the EC numbers covered by the dataset. Each
ring in the concentric pie chart represents one
level of the EC classification. The primary classification (1st digit) is represented by colours and the
innermost circle. The EC wheel for all enzymes
which have been classified by the Enzyme
Commission (EC)7 is shown in Figure 3(b) for
comparison. The two datasets have similar proportions of each EC classification, although there
are slightly fewer hydrolases (EC 3.– .– . –, 28% of
dataset compared with 34% overall) and slightly
more lyases (EC 4. – .– .– , 16% of dataset compared
with 11% overall) in our dataset. This shows that
the dataset is a reasonable representation of all
known enzyme functions.
A structural description of the dataset is given
by the CATH wheel (Figure 3(c)), with the CATH
wheel for all proteins in the PDB as a comparison
(Figure 3(d)). A total of 262 out of 303 protein
110
Analysis of Catalytic Residues
Figure 4. Observed frequency distribution of catalytic residue types compared with all residues in the dataset. CYSH
indicates free cysteine residues. CYSS indicates disulphide-bridged cysteine residues. Catalytic residues were taken
from each structure. In the case of structures with multiple subunits, the smallest possible unique unit was taken, e.g.
in a homodimer with catalytic residues on one subunit only, one subunit was used for the all residue calculation.
If the catalytic residues were split across two subunits, and there were two active sites in the homodimer, only one
subunit was used for the all residue calculation. However, if catalytic residues were split across two subunits, and
there was only one active site in the dimer, both subunits were used for the all residue calculation.
domains (86%) in the dataset is fully classified in
the CATH database. Of these, approximately 2/3
of the enzymes in the dataset fall into the a/b-class
of proteins, with approximately 1/6 each in the
mainly a and mainly b classes. There are a small
number of enzymes with few secondary structures.
The dominance of a/b-structures is different to the
distribution across the whole PDB. The mainly a
and mainly b classes are both under-represented
when compared with all proteins in the PDB. It
has previously been suggested19 that the underrepresentation of the mainly a class is due to the
fact that in helices, the main-chain polar group
hydrogen bonding potential is fully satisfied and
these groups are not available for catalytic interactions. The edges of b-sheets are thought to be
more accessible for interactions with the substrate
and catalytic machinery. The dominance of
a/b-folds is largely due to the presence of the
nucleotide binding domain in many enzymes,
and has been seen in previous fold/function
analyses.19,20
Frequency distribution
Figure 4 shows the observed frequency distribution of the different types of catalytic residue,
compared with that of all residues in the dataset.
Table 3 groups these into catalytic residue types.
From these, 65% of catalytic residues are provided
by the charged group of residues (H, R, K, E, D),
while 27% of catalytic residues come are provided
by the polar group of residues (Q, T, S, N, C, Y,
W), and just 8% are provided by the hydrophobic
group of residues. This is as expected: catalysis
involves the movement of protons and electrons
and charge stabilisation, which needs electrostatic
forces provided by charged and/or polar residues.
There is no correlation between percentage abundance in the dataset and contribution to catalysis.
Table 3. Catalytic residue types and their secondary structure compared with all residues in the dataset
Catalytic residue typea
Catalytic residues
All residues
Secondary structure environment
Charged (%)
Polar (%)
Hydrophobic (%)
Alpha helix (%)
Beta sheet (%)
Coil (%)
65
25
27
25
8
50
28
47
22
23
50
30
a
Histidine has been included in the charged group of residues, although strictly speaking it should be described as polar, its pKa in
a protein is usually altered so that it behaves as a charged residue.
Analysis of Catalytic Residues
111
Figure 5. Catalytic propensity of residue types. Catalytic propensity is defined as the percentage of catalytic residues
constituted by a particular residue type, divided by the percentage of all residues constituted by the same particular
residue type.
Histidine constitutes 18% of all catalytic residues
in proteins, although it has a low overall percentage abundance (2.7%). Histidine is particularly
suitable for carrying out catalytic reaction steps, as
it can be either charged or neutral at physiological
pH and can play the role of nucleophile, acid,
base or be involved in stabilising the transition
state of a reaction.
Aspartate and glutamate residues constitute 15%
and 11% of catalytic residues, respectively. Their
natural abundance is almost identical (5.7% and
5.9%, respectively). It could be that aspartate
residue is slightly favoured over glutamate residue
because it has a shorter side-chain by one methylene group, making the side-chain less flexible so it
could be held in place, aiding catalysis.
Arginine and lysine constitute 11% and 9% of
catalytic residues, respectively. Arginine occurs
more frequently in spite of its lower natural abundance in the dataset (4.9% for arginine and 5.8%
for lysine). This preference may be due to the
three nitrogen groups in the side-chain, all of
which can perform electrostatic interactions, compared with just one in the side-chain of lysine.
Additionally, since the side-chain of arginine can
make more electrostatic interactions, it can be positioned more accurately to facilitate catalysis. The
arginine side-chain also has a good geometry to
stabilise a pair of oxygen atoms on a phosphate
group, a common biological moiety.
Cysteine constitutes 5.6% of catalytic residues,
while its natural abundance is only 1.2%.
Disulphide bridges identified by PROMOTIF18
were grouped separately, so only “free” cysteine
residues are counted in the cysteine group. Four
disulphide bridge-forming cysteine residues are
involved in catalysis, these are found in glutathione reductase and protein disulphide isomerase.
Formation and cleavage of a disulphide bridge
between the two residues in glutathione reductase
forms part of the catalytic cycle.21,22 Destabilisation
of the disulphide bridge in protein disulphide
isomerase is thought to be part of the driving
force for catalysis in this enzyme. The high proportion of catalytic cysteine residues highlights
the importance of the thiol group in catalysis. Its
2 SH group is easily deprotonated to 2 S2 as it
has a pKa value of 9. Indeed, if one looks at the
catalytic propensity (or “catalycity”) of each residue (the proportion of catalytic residues/
proportion of all residues for each residue type,
see Figure 5), cysteine has the second highest catalycity behind histidine. These two residues have
the closest pKa values to biological pH of all the
amino acid residue side-chains, and this may
explain their high catalycity. Acid – base reactions
are very important in enzyme catalysis: the easier
to deprotonate and reprotonate a residue, the faster
it will be able to perform its catalytic function, and
the higher the turnover of the the enzyme. However, it is well-known that pKa values of any
amino acid residue can be altered from the standard solution value when buried within a protein
environment.23 This feature is often important in
catalysis.
Figure 5 shows the catalytic propensities of the
20 amino acid residues. Histidine and cysteine
residues have the highest propensities, these are
followed by the rest of the charged residues. Glutamate moves down in the order due to its higher
112
Analysis of Catalytic Residues
Figure 6. Catalytic propensity of residues interacting via their main-chain N– H or CvO groups.
abundance compared with arginine. The charged
residues are followed by the polar residues.
Tryptophan is the ninth out of 20 residues, an
unusually high position. The side-chain of tryptoTable 4. Hydrophobic residues aiding catalysis via their
side-chain as opposed to their main-chain
Residue
Enzyme
Met219
Human
fibroblast
stromelysin-1
Met20
Dihydrofolate
reductase
Leu28
Dihydrofolate
reductase
Leu54
Dihydrofolate
reductase
Leu20
D -amino-acid
aminotransferase
Gly734
Pyruvateformate lyase
Dihydrofolate
reductase
Phe31
Phe175
L -2-haloacid
dehalogenase
Phe77
Pentalenene
synthase
Phe50
4-oxalocrotonate
tautomerase
Description of function
Enhances effective concentration
of Zn2þ cofactor, which coordinates and enhances the nucleophilicity of a hydroxyl
nucleophile.41,42
Provides a hydrophobic region
pushing positive charge from N5
of folate to C6 where it can
accept hydride from NADPH.37
Constrains folate ring in
optimum position to receive
hydride.37
Constrains folate ring in
optimum position to receive
hydride.37
Aids PLP cofactor catalysis by
supporting the ring orientation
without disturbing oscillating
motions.43
Radical formation at the C-a
position.44,45
Forces proximity between folate
and NADPH optimising hydride
transfer.37
Forms a halide stabilising cradle
which makes the halide a better
leaving group.46
Stabilises carbocation intermediate with Asn219, by
cation– p interactions.47
Provides a hydrophobic environment to lower the pKa of the
N-terminal nucleophile.48
phan is found to be catalytic in only nine situations, however, its propensity is raised by its
very low natural abundance. After the polar
residues come the rest of the hydrophobic and
aromatic residues, as expected.
These results are surprisingly similar to the
distribution found by Zvelebil & Sternberg5 whose
dataset included only 17 enzymes and 36 catalytic
residues. Minor differences between their results
and these are probably due to the difference in
size of the respective datasets.
Side-chain and main-chain interactions
It is useful to distinguish between side-chain and
main-chain interactions, because for main-chain
interactions the identity of the residue is often
irrelevant. For main-chain interactions, only the
N – H and CvO groups are involved, and any one
of the 19 amino acid residues (i.e. all except proline) can provide this. Figure 6 shows catalytic
propensities of the 20 residues by main-chain. The
side-chain is used by 92% of catalytic residues,
while that of main-chain is 8%. Of those using
the main-chain, 82% use the N – H group and 18%
use the CvO group. Main-chain groups often
stabilise transition state intermediates, e.g. Gly30
in phospholipase A2.24
Glycine constitutes by far the highest proportion
of catalytic residues using the main-chain (44%). It
is often seen, as in phospholipase A2, stabilising
oxyanion holes. Glycine is ideal for this role
because of the small size of its side-chain, which
can easily fit into any gap in the active site architecture. Its N – H and CvO groups are more accessible than those of bulkier amino acid residues,
which are often occluded by the side-chain or
113
Analysis of Catalytic Residues
Figure 7. Residue solvent accessibility in the absence of ligands.
their positions in secondary structure. It has been
previously suggested that glycine residues provide
flexibility necessary for enzyme active sites to
change conformation.25
For the hydrophobic residues (M, F, L, I, G, A, P,
V), 81% of interactions involve the main-chain,
with a few notable exceptions (see Table 4). Where
hydrophobic residue side-chains are classified as
catalytic, their function is often to provide a neutral
environment to increase the relative catalytic
power of charged moieties in the same region, or
to exert steric strain on substrates which lowers
the energy of the transition state of a reaction.
Secondary structure
Table 3 shows the secondary structure distribution of catalytic residues compared with all
residues in the dataset. The majority (50%) occur
in coil regions (i.e. not helix or sheet), considerably
more than expected by chance. They are found
with similar frequencies in a-helices and b-sheets
(28% and 22%, respectively). This differs from the
distribution of all residues, which has a much
Table 5. Occurrence of catalytic residues in clefts in the
enzyme, as calculated by SURFNET12
Number of
enzymes (%)
$50% of catalytic residues in three
largest clefts
$50% of catalytic residues in any cleft
At least one catalytic residue in any cleft
No catalytic residues in any cleft
151 (85%)
160 (90%)
165 (93%)
12 (7%)
higher proportion of residues in an a-helical state.
A high percentage of b-strand residues are either
in an edge strand or at the end of a strand and are
therefore available for catalytic interactions with
substrates. On the other hand, a larger fraction of
residues are “internal” to the helix, i.e. not at the
ends of the helix, and so fewer are available for
catalytic interactions. Indeed, the active site of all
the TIM barrel family of enzymes is found at the
C-terminal end of a b-barrel, with catalytic residues either at the end of a b-sheet or in the loops
connecting the b-sheets.26 These results differ significantly from those of Zvelebil & Sternberg,5
who found little difference between the secondary
structure environment of catalytic residues and all
residues in their dataset. This work probably gives
a better representation of the distribution due to
the increased size of the dataset.
Solvent accessibility
Figure 7 shows the relative solvent accessibilities
of catalytic residues compared with polar residues
and all residues in the dataset, calculated in the
absence of ligands. The 89% of catalytic residues
have a relative solvent accessibility (%RSA) compared to fully exposed residues of less than 30%.
We find approximately 50% of all catalytic residues
in the 0 – 10% bracket, and approximately 25% in
the 10 –20% bracket. 5% of all catalytic residues
have 0% RSA and are totally buried. One might
expect to find all catalytic residues fully exposed
on the surface of the protein, but the results show
that this is not the case. Most catalytic residues
have very small exposures to solvent. The major
114
Analysis of Catalytic Residues
Figure 8. Comparison of residue solvent accessibility in the presence and absence of ligands.
factor could be the need for correct positioning and
restriction of the mobility of catalytic residues.
Considering surface topography we find that the
majority of catalytic residues occur in a large cleft
(see Table 5). In 160 enzymes (90%), over half of
the catalytic residues are found in one of the ten
largest clefts. Of these enzymes, almost all have
over half of their catalytic residues in one of the
three largest clefts (151). The cleft environment
will lower the effective dielectric response in
the region, which will increase the stabilisation
of polar transition states by neighbouring
charged residues or metal ions.27 Binding of the
ligand serves even more to exclude the solvent
(Figure 8).
Of the structures in the dataset, 85 contain bound
substrates and/or substrate analogues and/or
inhibitors (i.e. not always the cognate ligand).
Figure 8 shows the solvent accessibility of catalytic
residues in these enzymes with and without the
ligand present. Upon ligand binding, the percentage of residues with 0– 5% relative solvent
accessibility increases from 27% to 72%. However,
we cannot take into account any change in domain
motion that occurs on substrate binding. We can
only examine the solvent accessibility of one rigid
Figure 9. Catalytic residue B-factors – a measure of residue flexibility. Absolute B-factors were taken from the PDB
file for each enzyme and normalised over the whole protein. Enzyme structures determined by NMR (1mek and
1adn) were excluded. Normalised B-factor values were placed into bins and the percentage of residues in each bin
displayed.
115
Analysis of Catalytic Residues
Figure 10. Normalised B-factors for individual catalytic residues (Arg, Asp, Cys, Glu, His and Lys) compared with
normalised B-factors for all residues of the same type.
structure with and without the ligand present.
Apo-enzymes may have exposed catalytic residues
that are buried due to domain motion on substrate
binding.
Residue flexibility
B-factors in the crystal structures were used as a
measure of residue flexibility. Figure 9 shows the
absolute temperature factors of catalytic residues
and all residues in the dataset. Catalytic residues
tend to have lower B-factors than all residues,
suggesting that they have to be more rigidly
held in place than the average residue. Catalytic
residues in enzymes without any ligand or cofactor
present (182 residues) are similar to those of all
catalytic residues, but have slightly higher
B-factors, suggesting that catalytic residues become
slightly more “fixed” only when the substrate or
cofactor is bound.
B-factor plots for individual residue types can
be seen in Figure 10. Catalytic arginine, lysine,
aspartate and glutamate residues all have much
lower B-factors than on average. Arginine could
have one or two nitrogen groups tethered while
the others perform the catalytic function. Lysine,
which normally has a very flexible side-chain, has
to be tethered for catalysis. For glutamate and
aspartate residues, one of the oxygen atoms of the
carboxylic acid group can be tethered whilst the
other performs its catalytic function. The distribution of B-factors for catalytic histidine and
cysteine residues is more similar to all histidine
and cysteine residues. This could be due to the
higher proportion of these residues being catalytic.
Conservation
One-hundred and ten enzymes in the dataset
produced sequence alignments that were suitably
116
Analysis of Catalytic Residues
Figure 11. Residue conservation scores. (a) Catalytic residue conservation scores compared with conservation scores
for all residues in the dataset. The conservation score ranges from 0 (least conserved) to 1 (most conserved). (b) Conservation scores in sequence and structural locality. The centre of gravity of the catalytic residues in each enzyme
was calculated and the conservation score of any residue falling within a sphere of 4 Å, 8 Å, and 12 Å of the centre of
gravity was recorded. Additionally the conservation scores of residues at sequence positions ^ 4, 8 and 12 amino
acid residues from each catalytic residue were recorded.
diverse for meaningful conservation analysis. The
conservation of catalytic residues compared with
all residues is shown in Figure 11(a). Catalytic
residues are clearly more conserved than the
average residue. Figure 11(b) shows the conservation of residues within spheres of 4 Å, 8 Å and
12 Å radius around the centre of gravity of the
catalytic residues, and also the conservation of resi-
dues ^ 4, 8 and 12 sequence positions away from
each catalytic residue. The conservation of residues
falls steadily as the distance from the catalytic
residues increases. This highlights the strong
selection pressures on catalytic residues compared
with other residues in the vicinity of the active
site, which will be important for substrate recognition. Efficient catalysis depends on exquisite
117
Analysis of Catalytic Residues
Table 6. Catalytic residue hydrogen bonds
Number making $1 H-bond
Via –N–H or – CvO group
Residue (number analysed)
Via side-chain atoms
To protein
To ligand
Total
To protein
To ligand
Total
Total
His (107)
Asp (93)
Arg (67)
Glu (65)
Lys (55)
Cys (38)
Tyr (32)
Asn (26)
Ser (26)
Gly (24)
Thr (18)
Gln (14)
Trp (9)
Phe (7)
Leu (7)
Met (4)
Ala (1)
Ile (2)
Pro (2)
Val (1)
86
74
55
59
49
32
27
18
22
10
16
13
8
5
6
2
1
2
1
1
1
6
3
1
3
3
2
1
5
7
3
1
0
2
1
0
0
0
0
0
87
77
55
59
50
32
27
19
25
14
16
13
8
6
6
2
1
2
1
1
81 (96%)
73 (96%)
54 (92%)
44 (94%)
40 (89%)
16
21
20
20
0
13
11
5
–
–
–
–
–
–
–
20 (24%)
10 (13%)
30 (51%)
6 (12%)
17 (38%)
2
5
3
4
0
3
5
0
–
–
–
–
–
–
–
84
76
59
47
45
16
22
20
21
0
14
13
5
–
–
–
–
–
–
–
100
90
65
62
54
33
31
23
26
14
18
14
8
6
6
2
1
2
1
1
All (598)
M/Ca (48)
487
28
39
14
501
34
498
6
105
3
422
7
557 (93%)
34 (71%)
Percentages shown for His, Asp, Arg, Glu and Lys are percentages of the total number making at least one hydrogen bond via sidechain atoms (e.g. 84 for histidine residue). Percentages shown on the “All” and M/C lines are percentage of the total number of
residues considered (600 for All, 48 for M/C).
a
Catalytic residues acting via main-chain groups.
positioning of critical atoms, which can often only
be achieved by using specific amino acid residues
(e.g. aspartate instead of glutamate). Additionally,
residues structurally close to catalytic residues
are more conserved than those close by in
amino acid residue sequence. One caveat is that
enzyme active sites are not necessarily spherical,
and the sphere may also pick up some buried core
residues which are conserved because they are
essential for maintaining the structural integrity of
the protein.
Hydrogen bonding
Table 6 shows the hydrogen bonds made by
all catalytic residues. Hydrogen bonds to water
molecules were excluded from this part of the
analysis although these are often critical components of catalysis. Of 598 catalytic residues
considered, the majority (93%) enter into at least
one hydrogen bond interaction, be it as a donor or
acceptor. This shows that catalytic residues have a
limited conformational freedom. The 84% of residues make at least one hydrogen bond via either
their N –H or CvO group, while 75% of residues
make at least one hydrogen bond via a side-chain
atom. This suggests that usually the residue
conformation is strongly tethered both for the
main-chain and the side-chain.
Of the residues making hydrogen bonds via the N–
H or CvO groups, almost all (97%) hydrogen bond
to another residue in the protein, and a very small
proportion (8%) hydrogen bond to a ligand. Most
of these hydrogen bonds will probably be necessary
to maintain positioning of the catalytic residues.
Of the residues making hydrogen bonds via
side-chain atoms, almost all (94%) hydrogen bond
with other amino acid residues in the protein. A
relatively small proportion form a hydrogen bond
with a ligand (19%).
Looking at individual residues, a significantly
higher proportion of the positively charged amino
acid residues, lysine and arginine, hydrogen bond
to a ligand (38% and 51%, respectively) compared
with negatively charged amino acid aspartate and
glutamate residues (13% and 12%, respectively).
This is possibly due to the fact that many metabolites are negatively charged. Phosphorylating
compounds such as glucose is a mechanism by
which they can be retained inside the cell.
All residues taking part in catalysis via their
main-chain groups form hydrogen bonds with the
protein (94%) or with the ligand (41%). Again,
these residues have tethered conformations. Only
21% form side-chain hydrogen bonds, reflecting in
part the high percentage of glycine residues, but
also the non-involvement of the side-chain in
catalysis.
Quaternary structure/domain usage
Almost all enzymes in our dataset (159) have
their active site contained within just one subunit,
with only 19 out of 178 enzymes (11%) having
118
Analysis of Catalytic Residues
Table 7. Catalytic residue functions
Residue (total)
Histidine (113)
Aspartate (92)
Arginine (68)
Glutamate (67)
Lysine (56)
Cysteine (39)
Tyrosine (34)
Asparagine (28)
Serine (27)
Glycine (24)
Threonine (18)
Glutamine (15)
Tryptophan (9)
Phenylalanine (7)
Leucine (7)
Methionine (4)
Alanine (2)
Isoleucine (2)
Proline (2)
Valine (1)
All (615)
No. of enzymes with at least one residue
performing this function
Variation in no. of residues performing this
function in any one enzyme
Nucleophile
Transition
state
stabiliser
Activates water/
cofactor/residue
Activates
substrate
Other
(radical/
modified)
58
31
6
30
13
6
17
1
6
–
3
–
–
–
–
–
–
–
1
–
4
6
–
–
1
21
1
–
9
–
1
–
–
–
–
–
–
–
–
–
18
10
51
7
24
2
7
19
4
19
10
7
3
5
3
2
2
–
–
–
37
45
9
31
11
5
4
6
9
3
4
5
3
1
2
1
–
–
–
1
13
6
5
5
9
1
5
5
1
1
–
3
–
1
2
1
–
2
1
–
1
–
–
–
4
7
2
–
–
1
–
–
3
–
–
–
–
–
–
–
172
(28%)
106
44 (7.2%)
193 (31%)
178 (29%)
60 (9.8%)
17 (2.7%)
44
96
104
41
11
1–7
1
1–6
1–6
1–6
1–2
Acid/
base
Residue functions are as defined in Table 2. The activate water, activate cofactor and primer categories are grouped into one, as are
the radical and modified groups. “All” percentages add up to more than 100 as a residue can have more than one function assigned to
it. The range of numbers of residues which can perform each function in any one enzyme using that function is given, e.g. in an
enzyme which uses at least one residue as an acid/base, there may be anything from one to seven catalytic residues performing that
function in that enzyme.
catalytic residues in more than one subunit of the
enzyme, i.e. the active site is at the interface of
two subunits. Of these 19 enzymes, 17 have catalytic residues split between two subunits, while
just two have catalytic residues split between
three subunits.
In addition, 108 out of 178 enzymes (60%) have
more than one domain. Of these, 35 have catalytic
residues split across more than one domain.
However, as this analysis deals with catalytic
residues and not with residues that only bind substrate or ligand, this number is probably an underestimate of enzymes whose active site is found at
an interface between domains and/or subunits.
Functions
Table 7 shows catalytic residue function as
defined by the classification previously described.
Of 615 catalytic residues, roughly equal proportions are involved in stabilising a proposed
transition state intermediate, affecting water/
cofactor/other residue and acting as an acid/base
(31%, 29% and 28%, respectively). Approximately
10% of residues activate the substrate in some
way, while 7% of residues form a covalent intermediate with the substrate via nucleophilic
attack. A very small number act as radicals or are
modified to perform their function.
Just over half of enzymes in this dataset have at
least one residue stabilising a proposed transition
state. The actual number of residues performing
this function in each enzyme can range from one
to six. Typically enzymes will have 1 –3 residues
acting in this way. Just three enzymes use six
transition-state stabilising residues—pentalenene
synthetase, 2-haloacid dehalogenase and adenylate
kinase. Pentalenene synthetase has to stabilise
a positively charged carbocation intermediate, as
well as a negatively charged pyrophosphate
leaving group.28 The mechanism of 2-haloacid
dehalogenase involves an ester hydrolysis step
which produces a negatively charged oxyanion
hole, as well as a halide stabilising cradle for the
leaving halide ion.29,30 Adenylate kinase uses mainly
positively charged residues to stabilise the negatively charged penta-coordinated transition state
which occurs during phosphate group transfer.31,32
Approximately, half of the enzymes in this dataset will have at least one residue acting as an acid
or base, while the actual number of residues performing this function ranges from one to seven.
Only one enzyme uses seven residues, this is
aconitase, which uses three different ion pairs to
transfer protons to hydroxide for elimination as
water, as well as a base to extract a proton from
the substrate.33,34 Typically enzymes will use 1 –2
acid/base residues.
119
Analysis of Catalytic Residues
Almost 60% of enzymes in this dataset use at
least one residue to activate a water, cofactor or
other residue. Typically enyzmes will use 1 –3
residues in this way, but there are exceptions
which use five or six residues in this way. High
molecular weight acid phosphatase uses four residues to alter the pKa of the nucleophile involved
in the reaction and one residue to activate a water
molecule for attack on the enzyme – substrate
intermediate.35,36 Glutathione reductase uses six
residues, four of which are involved in facilitating
hydride transfer from NADPH to FAD. The other
two are responsible for activating the cysteine
nucleophile.21,22
Just over 20% of enzymes have a residue which
activates the substrate in some way. The number
of residues performing this role in each enzyme
can vary from one to six. Typically 1 –2 residues
will perform this role. However, dihydrofolate
reductase uses six residues in this way to put electrostatic and steric strain on the substrate in order
for the reaction to occur.37
A quarter of enzymes in this dataset use a
nucleophile, but in each of these enzymes there is
only ever one nucleophilic residue. A very small
proportion of enzymes in this dataset employ
radical formation or residue modification for catalysis. Usually 1 –2 residues in each of these enzyme
play such a role.
There are typical roles for one or two residues,
for instance the nucleophiles are generally cysteine,
serine and occasionally aspartate (but surprisingly,
hardly ever the similar corresponding residues
threonine and glutamate—this could be due to
increased bulkiness as these residues have an
extra carbon in their side-chain, or possibly
increased flexibility of the side-chain). The major
role of arginine is to stabilise the transition state,
while the negatively charged residues aspartate
and glutamate are typically acid/bases. The major
role of histidine is also as an acid/base, which it
performs more often than expected on average.
The most common function for the hydrophobic
group of residues (G, F, L, M, A, I, P, V) is in the
stabilisation of a proposed transition state intermediate, usually but not always via the mainchain groups.
Discussion
Caveats
The classification of catalytic residues presented
here is dependent on manual-extraction of information from the primary literature. The residue
selection used is, therefore, only as complete as
the literature from which it was extracted. For
instance, if oxyanion hole-stabilising residues
have not been identified in an enzyme that clearly
utilizes a serine protease-like mechanism, they
were not included in the analysis. Information in
the literature is, in turn, dependent on the accuracy
and reliability of structural data deposited in the
PDB, and mutagenesis studies from which catalytic
residues and mechanisms are inferred.
Not all the dataset has been fully classified in
CATH, and although every effort has been made
to ensure that the dataset is non-redundant, as
those partially classified domains become fully
classified, previously undetected homologies
may come to light. Analysis of catalytic residue
conservation could only be performed on those
enzyme families whose sequences, when PSIBLASTed, produced a suitably diverse alignment
(60% of the dataset).
Conclusions
This work represents a structural and functional
analysis of enzyme catalytic residues across a dataset of 178 enzymes, chosen using the strict criteria
described herein. Catalytic residue types are
limited, with just six residue types (H, C, E, D, R,
K) accounting for 70% of all catalytic residues.
Surprisingly, serine residue, usually thought of as
a typical example of a catalytic residue type, is not
included in this set. Catalytic residues have very
limited exposure to solvent (as defined by relative
accessibility) despite their polarity. They are very
precisely positioned and held in place, as shown
by their low B-factors and hydrogen bonding.
Nearly all catalytic residues are hydrogen bonded
via backbone groups, and three-quarters also via
side-chain groups. Some of these hydrogen bonds
will be important to maintain the structural integrity of the active site, others will be important in
setting up the reaction that the enzyme performs.
In spite of their apparent rigidity, catalytic residues
are often found in a coil environment, and the vast
majority are found in a cleft. Catalytic residues are
highly conserved, as is their local three-dimensional environment. The local three-dimensional
environment is more conserved than other residues close by in sequence. The most common catalytic residue function is to stabilise a proposed
transition state intermediate. Primary catalytic residue functions (acid/base, nucleophile and transition state stabilisation) account for 65% of all
catalytic residue functions. Typically the number
of residues performing any one function in an
enzyme ranges from one to three, but there are
some enzymes which use up to six or seven
residues for functions such as transition state
stabilisation, acid/base, activating the substrate
and activating water, cofactor or other residue.
This could be in part due to variation in the
number of steps involved in a catalytic mechanism,
but could also be due to variation in the number of
residues quoted by authors as having a functional
role.
Catalytic residues are most commonly found
confined to one subunit or within a single domain.
However, the fact that almost a third of enzymes
in our dataset have their catalytic residues split
across several subunits and/or domains has
120
interesting implications for enzyme evolution. If
one considers that the basic unit of protein structure is the domain, how did an active site evolve
with catalytic residues split across two or more
domains or subunits? One hypothesis is that the
primitive enzyme had its catalytic machinery on
one domain, and catalysed the first step of a reaction. Later steps may have involved a process,
which may have occurred naturally over time (e.g.
hydrolysis). Then, by chance, the enzyme may
have evolved another domain with residues that
were well placed to speed up these processes, or
stabilise intermediates so that the enzyme had an
optimised turnover and a selective advantage.
Another possible explanation is that convergent
evolution of two different functional elements on
two distinct domains occurred to form an enzyme
with an adapted function.
These results will provide a heuristic basis for
predicting catalytic residues in enzymes of
unknown function and hopefully facilitate a better
understanding of enzyme mechanisms.
Analysis of Catalytic Residues
10.
11.
12.
13.
14.
15.
16.
Acknowledgements
G.J.B. is funded by a BBSRC CASE studentship
in association with Roche Products Ltd. We thank
Annabel Todd and Stuart Rison for helpful
discussion.
17.
18.
19.
References
1. Walsh, C. (2001). Enabling the chemistry of life.
Nature, 409, 226– 231.
2. Blow, D., Birktoft, J. & Hartley, B. (1969). Role of a
buried acid group in the mechanism of action of
chymotrypsin. Nature, 221, 337– 340.
3. Wright, C., Alden, R. & Kraut, J. (1969). Structure of
subtilisin BPN0 at 2.5 angstrom resolution. Nature,
221, 235– 242.
4. Wallace, A., Laskowski, R. & Thornton, J. (1996).
Derivation of 3D coordinate templates for searching
structural databases: application to Ser-His-Asp
catalytic triads in the serine proteinases and lipases.
Protein Sci. 5, 1001– 1013.
5. Zvelebil, M. & Sternberg, M. (1988). Analysis and
prediction of the location of catalytic residues in
enzymes. Protein Eng. 2, 127– 138.
6. Bernstein, F., Koetzle, T., Williams, G., Meyer, E. E. J.,
Brice, M., Rodgers, J. et al. (1977). The Protein Data
Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535– 542.
7. Webb, E. (1992). Enzyme Nomenclature. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology.
Academic Press, New York.
8. Orengo, C., Michie, A., Jones, S., Jones, D., Swindells,
M. & Thornton, J. (1997). CATH—a hierarchic classification of protein domain structures. Structure, 5,
1093– 1108.
9. Orengo, C., Pearl, F., Bray, J., Todd, A., Martin, A., Lo
Conte, L. & Thornton, J. (1999). The CATH database
20.
21.
22.
23.
24.
25.
26.
27.
28.
provides insights into protein structure/function
relationships. Nucl. Acids Res. 27, 275– 279.
Bullock, T., Branchaud, B. & Remington, S. (1994).
Structure of the complex of L -benzylsuccinate with
wheat serine carboxypeptidase II at 2.0 Å resolution.
Biochemistry, 33, 11127– 11134.
Henrick, K. & Thornton, J. (1998). PQS: a protein
quaternary structure file server. Trends Biochem. Sci.
23, 358– 361.
Laskowski, R. (1995). SURFNET: a program for
visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph. 13, 323– 330.
See also pp. 307– 308.
Altschul, S., Madden, T., Schaffer, A., Zhang, J.,
Zhang, Z., Miller, W. & Lipman, D. (1997). Gapped
BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucl. Acids Res. 25,
3389– 3402.
Valdar, W. & Thornton, J. (2001). Conservation helps
to identify biologically relevant crystal contacts.
J. Mol. Biol. 313, 399–416.
Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C.
(1978). A model of evolutionary change in proteins:
matrices for detecting distant relationships. In Atlas
of Protein Sequence and Structure, vol. 5, pp. 345– 358,
National
Biomedical
Research
Foundation,
Washington, DC.
Shannon, C. E. (1948). A mathematical theory of
communication. Bell Sys. Tech. J. 27, 379– 423. See
also pp. 623– 656.
McDonald, I. & Thornton, J. (1994). Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238,
777– 793.
Hutchinson, E. & Thornton, J. (1996). PROMOTIF a
program to identify and analyze structural motifs in
proteins. Protein Sci. 5, 212– 220.
Martin, A., Orengo, C., Hutchinson, E., Jones, S.,
Karmirantzou, M., Laskowski, R. et al. (1998). Protein
folds and functions. Structure, 6, 875– 884.
Hegyi, H. & Gerstein, M. (1999). The relationship
between protein structure and function: a comprehensive survey with application to the yeast genome.
J. Mol. Biol. 288, 147–164.
Pai, E. F. & Schulz, G. E. (1983). The catalytic
mechanism of glutathione reductase as derived
from X-ray diffraction analyses of reaction intermediates. J. Biol. Chem. 258, 1752– 1757.
Karplus, P. A. & Schulz, G. E. (1989). Substrate binding and catalysis by glutathione reductase as derived
from refined enzyme: substrate crystal structures
at 2 Å resolution. J. Mol. Biol. 210, 163– 180.
Silverman, R. (2000). The Organic Chemistry of
Enzyme-catalysed Reactions, Academic Press, New
York.
Scott, D., Otwinowski, Z., Gelb, M. & Sigler, P. (1990).
Crystal structure of bee venom phospholipase A2 in
a complex with a transition-state analogue. Science,
250, 1563– 1566.
Yan, B. & Sun, Y. (1997). Glycine residues provide
flexibility for enzyme active sites. J. Biol. Chem. 272,
3190– 3194.
Kallenbach, N. (2001). Breaking open a protein
barrel. Proc. Natl. Acad. Sci. USA, 98, 2958– 2960.
Fersht, A. (1998). Structure and Mechanism in Protein
Science, Freeman, San Francisco, CA.
Lesburg, C. A., Zhai, G., Cane, D. E. & Christianson,
D. W. (1997). Crystal structure of pentalenene
synthase: mechanistic insights on terpenoid cyclization reactions in biology. Science, 277, 1820– 1824.
121
Analysis of Catalytic Residues
29. Li, Y. F., Hata, Y., Fujii, T., Hisano, T., Nishihara, M.,
Kurihara, T. & Esaki, N. (1998). Crystal structures of
reaction intermediates of 2-haloacid dehalogenase
and implications for the reaction mechanism. J. Biol.
Chem. 273, 15035– 15044.
30. Ridder, I. S., Rozeboom, H. J., Kalk, K. H. & Dijkstra,
B. W. (1999). Crystal structures of intermediates in
the dehalogenation of haloalkanoates by 2-haloacid
dehalogenase. J. Biol. Chem. 274, 30672– 30678.
31. Yan, H. G. & Tsai, M. D. (1991). Mechanism of
adenylate kinase. Demonstration of a functional
relationship between Aspartate 93 and Mg2þ by sitedirected mutagenesis and proton, phosphorus-31,
and magnesium-25 NMR. Biochemistry, 30,
5539–5546.
32. Muller, C. W. & Schulz, G. E. (1992). Structure of the
complex between adenylate kinase from Escherichia
coli and the inhibitor Ap5A refined at 1.9 Å
resolution. Model for a catalytic transition state.
J. Mol. Biol. 224, 159 –177.
33. Zheng, L., Kennedy, M. C., Beinert, H. & Zalkin, H.
(1992). Mutational analysis of active site residues in
pig heart aconitase. J. Biol. Chem. 267, 7895– 7903.
34. Lauble, H., Kennedy, M. C., Beinert, H. & Stout, C. D.
(1992). Crystal structures of aconitase with isocitrate and nitroisocitrate bound. Biochemistry, 31,
2735–2748.
35. Lindqvist, Y., Schneider, G. & Vihko, P. (1994). Crystal structures of rat acid phosphatase complexed
with the transition-state analogs vanadate and
molybdate. Implications for the reaction mechanism.
Eur. J. Biochem. 221, 139–142.
36. Zhang, M., Zhou, M., Etten, V. R. L. & Stauffacher,
C. V. (1997). Crystal structure of bovine low molecular weight phosphotyrosyl phosphatase complexed
with the transition state analog vanadate. Biochemistry, 36, 15 – 23.
37. Bystroff, C., Oatley, S. & Kraut, J. (1990). Crystal
structures of escherichia coli dihydrofolate
reductase: the NADPþ holoenzyme and the
folate·NADPþ ternary complex. Substrate binding
and a model for the transition state. Biochemistry, 29,
3263–3277.
38. Toth, E. A. & Yeates, T. O. (2000). The structure of
adenylosuccinate lyase, an enzyme with dual activity
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
in the de novo purine biosynthetic pathway. Struct.
Fold Des. 8, 163– 174.
Todd, A., Orengo, C. & Thornton, J. (2001). Evolution
of function in protein superfamilies, from a
structural perspective. J. Mol. Biol. 307, 1113 – 1143.
Bairoch, A. (1993). The ENZYME data bank. Nucl.
Acids Res. 21, 3155– 3156.
Rawlings, N. & Barrett, A. (1995). Evolutionary
families of metallopeptidases. Methods Enzymol. 248,
183 –228.
Gomis-Ruth, F. & Stockler, W. (1993). Astacins,
serralysins, snake venom and matrix metalloproteinases exhibit identical zinc-binding environments
(hexxhxxgxxh and met-turn) and topologies and
should be grouped into a common family, the
metzincins. FEBS Letters, 331, 134 –140.
Sugio, S., Kashima, A., Kishimoto, K., Peisach, D.,
Petsko, G., Ringe, D. et al. (1998). Crystal structures
of L201A mutant of D -amino acid aminotransferase
at 2.0 Å resolution: implication of the structural role
of Leu201 in transamination. Protein Eng. 11,
613 –619.
Plaga, W., Vielhaber, G., Wallach, J. & Knappe, J.
(2000). Modification of Cys-418 of pyruvate formatelyase by methacrylic acid, based on its radical
mechanism. FEBS Letters, 466, 45 – 48.
Becker, A., Fritz-Wolf, K., Kabsch, W., Knappe, J.,
Schultz, S. & Volker Wagner, A. (1999). Structure
and mechanism of the glycyl radical enzyme pyruvate formate-lyase. Nature Struct. Biol. 6, 969– 975.
Ridder, I., Rozeboom, H., Kalk, K. & Dijkstra, B.
(1999). Crystal structures of intermediates in the
dehalogenation of haloalkanoates by L -2-haloacid
dehalogenase. J. Biol. Chem. 274, 30672– 30678.
Lesburg, C., Zhai, G., Cane, D. & Christianson, D.
(1997). Crystal structure of pentalenene synthase:
mechanistic insights on terpenoid cyclization reactions in biology. Science, 277, 1820– 1824.
Czerwinski, R., Harris, T., Massiah, M., Mildvan, A.
& Whitman, C. (2001). The structural basis for the
perturbed pKa of the catalytic base in 4-oxalocrotonate tautomerase: kinetic and structural effects of
mutations of Phe-50. Biochemistry, 40, 1984– 1995.
Edited by M. Levitt
(Received 13 March 2002; received in revised form 26 July 2002; accepted 10 August 2002)