Hydrogen bonds and salt bridges across protein

Protein Engineering vol.10 no.9 pp.999–1012, 1997
Hydrogen bonds and salt bridges across protein–protein interfaces
Dong Xu1, Chung-Jung Tsai2 and Ruth Nussinov1,3,4
1Laboratory
of Mathematical Biology, IRSP, SAIC Frederick, NCI–FCRDC,
Frederick, MD 21702-1201, 2Laboratory of Mathematical Biology, NCI–
FCRDC, PO Box B, Frederick, MD 21702-1201, USA and 3Sackler Institute
of Molecular Medicine, Tel Aviv University, Tel Aviv 69978, Israel
4To
whom correspondence should be addressed, at the first address
To understand further, and to utilize, the interactions
across protein–protein interfaces, we carried out an analysis
of the hydrogen bonds and of the salt bridges in a collection
of 319 non-redundant protein–protein interfaces derived
from high-quality X-ray structures. We found that the
geometry of the hydrogen bonds across protein interfaces
is generally less optimal and has a wider distribution
than typically observed within the chains. This difference
originates from the more hydrophilic side chains buried in
the binding interface than in the folded monomer interior.
Protein folding differs from protein binding. Whereas in
folding practically all degrees of freedom are available to
the chain to attain its optimal configuration, this is not the
case for rigid binding, where the protein molecules are
already folded, with only six degrees of translational and
rotational freedom available to the chains to achieve their
most favorable bound configuration. These constraints
enforce many polar/charged residues buried in the interface
to form weak hydrogen bonds with protein atoms, rather
than strongly hydrogen bonding to the solvent. Since
interfacial hydrogen bonds are weaker than the intra-chain
ones to compete with the binding of water, more water
molecules are involved in bridging hydrogen bond networks
across the protein interface than in the protein interior.
Interfacial water molecules both mediate non-complementary donor–donor or acceptor–acceptor pairs, and connect
non-optimally oriented donor–acceptor pairs. These differences between the interfacial hydrogen bonding patterns
and the intra-chain ones further substantiate the notion
that protein complexes formed by rigid binding may be
far away from the global minimum conformations. Moreover, we summarize the pattern of charge complementarity
and of the conservation of hydrogen bond network across
binding interfaces. We further illustrate the utility of this
study in understanding the specificity of protein–protein
associations, and hence in docking prediction and molecular
(inhibitor) design.
Keywords: bound water/hydrogen bond/molecular recognition/
protein association/salt bridge/statistical analysis
Introduction
Hydrogen bonds and salt bridges play a central role in protein
binding. The protein surface is studded with many hydrophilic
residues. It has been reported that the average charge density
of native proteins is 1.4 charged groups per 100 Å2 of protein
surface (Barlow and Thornton, 1986). Hence the binding
© Oxford University Press
interface, which basically consists of protein surfaces, is
generally more hydrophilic than the protein interior. In addition,
as early studies have shown, protein active sites often provide
electrostatic complementarity to the charge distribution of the
binding substrates (Warshel and Russell, 1984; Cherfils et al.,
1991; Novotny and Sharp, 1992; Creighton, 1993). Taken
together, binding interfaces tend to form more hydrogen bonds
and salt bridges than protein interiors.
Aside from the different densities, the hydrogen bonds and
salt bridges across the binding interfaces and in the protein
interiors also differ in their relative contributions to the
energetics. As shown in our earlier studies (Xu et al., 1997),
electrostatic interactions play a more important role in binding
than in folding. Hence interfacial hydrogen bonds and salt
bridges, as the major contributors to the electrostatic interactions between proteins, play a more important role in binding
than the intra-monomer hydrogen bonds and salt bridges in
folding. Furthermore, unlike intra-chain interactions in folded
proteins, bound water molecules often help mediate protein–
protein binding (Creighton, 1993; Bhat et al., 1994). Consequently, ordered water molecules bridging the hydrogen
bond network between proteins contribute to the stabilization
of the complexes (Bhat et al., 1994; Helms and Wade, 1995).
Protein–protein association differs from protein folding. In
folding the backbone of the folding polypeptide chain is in
principle free to explore all configurations, leading to an
optimally folded monomer structure. All degrees of freedom
are available to the backbone in its conformational search.
That, however, is not the case in the binding of two protein
molecules. For most reversible associations, the structures of
the proteins do not change significantly upon complex formation and follow a three-state model in their binding process.
Initially, the unfolded chains fold to their native configurations.
Subsequently, these already folded proteins associate to form
their bound state. In this case, although some conformational
rearrangements are observed, only rather minor adjustments
are generally made to optimize the binding, and most of these
are in side chain movements (Norel et al., 1997).
This basic difference between folding and binding implies
that although the types of interactions involved in both processes are similar, their relative contributions necessarily differ.
We have already shown that the hydrophobic effect, although
critically important in binding, is not as dominant as in folding
(Tsai et al., 1997). Furthermore, we have shown that a large
variability is observed among the interfaces. This can be
straightforwardly rationalized, as each of the chains has first
folded to attain its most stable configuration, with its hydrophobic residues largely buried in the core. The binding monomers seek to optimize their interacting interfaces, however,
they have only six rotational and translational degrees of
freedom to explore. In other words, the complex only reaches
the optimal state in a subset of the conformation space. In
addition, the packing of the hydrophobic residues in the interior
has already resulted in pushing the hydrophilic residues to the
999
D.Xu, C.-J.Tsai and R.Nussinov
surface, and hence in a priori limiting the extent of the potential
hydrophobic effect at the interface.
In approaching the role played by hydrogen bonds in folding
as compared with binding, one needs to bear in mind another
fundamental difference. In folding, the buried polar groups in
the backbone of protein chains are generally satisfied by
forming hydrogen bonds. In doing so, depending on the
sequence, a particular secondary structure is chosen. In binding,
the secondary structures are already formed. While a β-sheet
is sometimes extended across the binding interface, an α-helix
can only be formed by a single, continuous chain.
These considerations have several implications. First, unlike
in the case of folding, in rigid binding a complex may exist
in a configuration far from its global minimum (Lin et al.,
1995; Xu et al., 1997). Second, on the practical side, this
immediately suggests a reason for the difficulty in devising
approaches which would energetically distinguish between
‘correct’ and ‘incorrect’ bound configurations. The energy gap
may be expected to be significantly smaller, making it very
difficult to score predicted complex structures by using energetic parameters (Finkelstein et al., 1995). However, such
predictions are extremely important in molecular and drug
design. The question of the hydrogen bonds is particularly
relevant in this regard. Certainly, owing to the larger fraction
of polar, and charged, surface residues, their contribution to
stabilizing the interacting molecules is important. However,
here geometry plays a key role in determining their quality.
Owing to the constraints imposed by bond lengths and bond
angles, the geometries of the hydrogen bonds are unlikely to
be optimized. The extent and ranges of these, and the role
played by the water molecules in this regard, have implications
in ligand design.
Hydrogen bonds and salt bridges are particularly essential
in determining binding specificity (Fersht, 1984; Honig and
Yang, 1995). Specific protein–protein association is a pattern
recognition process. There are two conditions for binding to
occur, namely geometrical complementarity and stability in
energetics. The hydrophobic effect, hydrogen bonds and salt
bridges all play very important roles in energetics. A hydrogen
bond or a salt bridge can provide favorable free energy to the
binding (Bartlett and Marlowe, 1984; Gao et al., 1996; Xu
et al., 1997). On the other hand, an unfulfilled hydrogen bond
donor/acceptor, or an isolated charge without forming a salt
bridge, when buried in the protein interface, could substantially
destabilize binding, owing to the desolvation effect. Such a
contrast in energetics contributes to a high selectivity in
matching the hydrogen bonds and salt bridges between proteins,
and confers binding specificity. A statistical analysis of hydrogen bonds and of salt bridges across protein–protein interfaces
bears on the issue of binding specificity, and hence has direct
implications to rational inhibitor design.
Studies of inter-chain hydrogen bonds and salt bridges can
provide an insight into the role of the hydrogen bonds and
salt bridges in domain (folding unit) packing within a monomer
as well. The ‘docking’ of folding units, or of compact,
hydrophobic, independently folding nuclei (Tsai and Nussinov,
1997), during the protein folding process is similar to that
taking place in binding. Both involve recognition, whether
inter- or intramolecular. Such a similarity has been elegantly
revealed in the three-dimensional domain swapping, which
sometimes takes place during oligomer assembly. There, one
domain of a monomeric protein is replaced by the same domain
from an identical chain of its twin protein (Bennett et al.,
1000
1994, 1995). The special role of hydrogen bonds and salt
bridges between protein domains has been noticed. For
example, some subunits in allosteric proteins bind predominantly through ion pairs and hydrogen bonds (Korn and
Burneet, 1991), and a salt bridge network can connect protein
subunits or join two secondary structures to form quaternary
structures (Musafia et al., 1995). Although we did not carry
out a study for the domain interfaces as we have done here
for the protein interfaces, they may be expected to behave
similarly. In terms of their inter-domain interface composition,
we expect them to be composed of less polar residues as
compared with inter-chain interfaces, although appreciably
more polar than is observed in protein cores.
The prevailing view holds that the hydrophobic effect has
a dominant role in stabilizing protein structures. However, the
contribution of hydrogen bonds and of salt bridges has recently
drawn increasing attention (Shirley et al., 1992; Marqusee and
Sauer, 1994; Wilson et al., 1994; Hendsch et al., 1996; Myers
and Pace, 1996; Tissot et al., 1996; Xu et al., 1997). Statistical
analyses have been carried out for hydrogen bonds (Baker and
Hubbard, 1984; Jeffrey and Saenger, 1991; McDonald and
Thornton, 1994) and salt bridges (Barlow and Thornton, 1983;
Rashin and Honig, 1984; Hendsch and Tidor, 1994; Musafia
et al., 1995; Gandini et al., 1996) in globular proteins. Few
studies have been carried out for those across protein binding
interfaces. In particular among these, Janin and Chothia (1990)
analyzed hydrogen bonds for 15 protein–protein interfaces.
We have compiled a non-redundant dataset of protein–
protein interfaces (Tsai, 1996; Tsai et al., 1996), enabling us
to address the questions posed above. By excluding intra-chain
hydrogen bonds and salt bridges, many of which are enforced
by chemical bonds or by secondary structures, we can obtain
exclusive information on inter-chain associations. Here we
provide a comprehensive statistical investigation of hydrogen
bonds, salt bridges and water molecules across protein interfaces, utilizing a substantially larger dataset. We describe our
results, their implications and utility.
In the next section, we describe the statistical methods
which we employed. In the subsequent section, we present
results of the statistical analysis of hydrogen bonds and salt
bridges across the 319 protein interfaces, illustrating some
structural details. We then discuss our observations, highlighting the similarities and the differences with the chains,
the binding specificity, and the implications to the energetics
of protein–protein association. Finally, we summarize our
conclusions.
Methods
In this section, we introduce the selected dataset and the
methods we employed in its statistical analysis.
Dataset selection
To obtain meaningful statistical properties of hydrogen bonds
and salt bridges across protein interfaces, one needs a high
quality, non-redundant experimental dataset. For this purpose,
starting with 376 non-homologous interfaces (Tsai, 1996),
generated from 1629 two-chain interfaces (Tsai et al., 1996)
present in the Protein Data Bank (PDB) (Bernstein et al.,
1977), we kept only the X-ray structures with a resolution
under 3.0 Å. We removed all others, including theoretical
models and NMR structures, whose quality is difficult to
assess. There are 319 interfaces left in our dataset, as shown
in Table I. Among them, 54 interfaces have a resolution of 1.8
H bonds and salt bridges across protein–protein interfaces
Table I. PDB codes and chains of the protein interfaces employed
104l:AB
1aal:AB
1aap:AB
1aar:AB
1aaz:AB
1aab:CD
1afn:BC
1ake:AB
1alk:AB
1ank:AB
1aoz:AB
1atn:AD
1atp:EI
1bab:AC
1bab:AD
1bao:AB
1bao:BC
1bar:AB
1bbb:AB
1bbb:AC
1bbh:AB
1bbp:BC
1bbp:BD
1bbp:CD
1bbr:EK
1bbr:HE
1bbr:KG
1bbr:KN
1bbr:LE
1bbt:12
1bbt:14
1bgs:BF
1bgs:FG
1bov:AE
1bsc:BC
1bsr:AB
1c2r:AB
1cau:AB
1cax:CF
1cdd:AB
1cdt:AB
1chm:AB
1cho:EI
1cmb:AB
1col:AB
1cos:AB
1cpc:AB
1cpc:AK
1cse:EI
1csg:AB
1cth:AB
1d66:AB
1dfn:AB
1dhf:AB
1dhj:AB
1dsb:AB
1fc1:AB
1fc2:CD
1fcb:AB
1fia:AB
1fki:AB
1fvc:BD
1fvd:AC
1fxi:AD
1gd1:PQ
1gd1:PR
1gdh:AB
1ggi:LJ
1gla:FG
1glu:AB
1gma:AB
1gmf:AB
1gmq:AB
1gp1:AB
1gpa:AD
1hds:AC
1hge:AC
1hge:AD
1hge:BF
1hge:DE
1hge:EF
1hgt:HI
1hgt:LH
1hhi:BD
1hjj:AB
1hil:CD
1hle:AB
1hpl:AB
1hrh:AB
1hsa:AD
1hsl:AB
1hst:AB
1hvi:AB
1igf:LJ
1igf:LM
1isu:AB
1ith:AB
1jhl:HA
1jhl:LA
1l97:AB
1ldn:AB
1ldn:FG
1lga:AB
1lld:AB
1lmb:34
1log:AD
1lta:DC
1lta:EC
1lts:AC
1lts:DE
1lts:FC
1lys:AB
1mbl:AB
1mch:AB
1min:AD
1min:BD
1min:CD
1mol:AB
1ncb:NH
1ncc:NL
1nco:AB
1ndl:AC
1nip:AB
1nsc:AB
1opa:AB
1opb:AD
1ova:AB
1ova:BD
1ovo:AB
1ovo:BC
1ovo:CD
1paf:AB
1per:LR
1pfk:AB
1plf:AB
1plf:AC
1plf:BD
1pob:AB
1poe:AB
1pox:AB
1pp2:RL
1prc:CH
1prc:CL
1prc:CM
1prc:LH
1prc:LM
1prc:MH
1psa:AB
1psh:AB
1psp:AB
1pts:AB
1pya:AC
1pya:DE
1pya:DF
1pya:EF
1pyd:AB
1pyg:AB
1r09:13
1r09:24
1rag:BD
Å or less, as listed in Table II. In parallel, we compared the
interfaces with the protein interiors. The protein interiors were
analyzed by using the 550 chains involved in the interface
dataset of Table I.
Hydrogen bond analysis
Each hydrogen bond can be characterized by the variables
defined in Figure 1. The software used to analyze hydrogen
bonds is HBPLUS (McDonald et al., 1993; McDonald and
Thornton, 1994). The program determines the positions of
missing hydrogens in the PDB and checks each donor–acceptor
pair to ascertain its fitness to the geometric criteria as follows:
the maximum distances are 3.9 Å between donor and acceptor
(d , 3.9 Å) and 2.5 Å between acceptor and hydrogen (r ,
2.5 Å); the minimum angles are 90.0° for the angle of donor–
hydrogen–acceptor (θ . 90.0°), for the angle of donor–
acceptor–acceptor antecedent (φ . 90.0°) and for the angle of
hydrogen–acceptor–acceptor antecedent (γ . 90.0°) (Baker
and Hubbard, 1984). Amino-aromatic hydrogen bonds are not
taken into account in our analysis.
Definition of salt bridges
The salt bridge is evaluated according to the distance
between the donor atoms (Nζ of Lys, Nζ, Nη1 and Nη2 of
Arg,Nδ1 and Nε2 of His and the amide N of the N-terminus)
and the acceptor atoms (Oε1 and Oε2 of Glu, Oδ1 and Oδ2
of Asp and the two carboxyl oxygen atoms of the Cterminus). If the distance is ø4.0 Å, the pair is counted as
a salt bridge (Barlow and Thornton, 1983). When the
geometry is acceptable, a salt bridge is also counted as a
hydrogen bonding pair.
1rbb:AB
1rcm:AB
1rhg:AC
1rhg:BC
1rib:AB
1rn1:AC
1rtp:12
1rtp:23
1sac:AB
1scm:AB
1scm:AC
1scm:BC
1sdy:AD
1sdy:BD
1shf:AB
1slt:AB
1sos:AE
1sos:FE
1sos:FG
1sps:CF
1srd:AD
1srn:AB
1sry:AB
1stf:EI
1tbe:AB
1tbp:AB
1tcb:AB
1tet:HP
1tet:LP
1tgx:AB
1tme:12
1tme:13
1tme:34
1tnf:AB
1tpk:BC
1tpl:AB
1trk:AB
1trm:AB
1tro:AC
1trz:BD
1tta:AB
1vfa:AB
1vfb:AC
1vfb:BC
1vmo:AB
1vsg:AB
1wsy:AB
1xim:AB
1xim:AC
1xim:AD
1yca:AB
201l:AB
256b:AB
2aai:AB
2abx:AB
2aza:AB
2azu:AD
2azu:BD
2bbk:HJ
2bbk:HL
2bbk:LJ
2ccy:AB
2cga:AB
2cgr:LH
2cwg:AB
2fb4:LH
2gst:AB
2hhm:AB
2hmz:CD
2hpd:AB
2kai:AI
2ltn:AC
2ltn:CD
2mlt:AB
2msb:AB
2mta:AC
2mta:HA
2mta:LA
2nck:RL
2ohx:AB
2pcb:AB
2pcb:AC
2pcc:AC
2pcc:CD
2pfk:AB
2phi:AB
2pka:AY
2pka:BY
2plv:12
2plv:14
2plv:23
2plv:34
2pol:AB
2rsl:AB
2rsl:BC
2scp:AB
2sod:BG
2spc:AB
2tbv:AB
2tmd:AB
2tpr:AB
2trx:AB
2tsc:AB
2utg:AB
3aah:AC
3aah:CD
3eca:AB
3eca:BC
3eca:BD
3gap:AB
3hhr:AB
3hhr:AC
3hhr:BC
3ink:CD
3ins:AB
3lad:AB
3mcg:12
3mds:AB
3mon:CD
3p2p:AB
3rp2:AB
3rub:LS
3sc2:AB
3sdh:AB
3sgb:EI
4azu:AB
4azu:BC
4cha:AB
4cts:AB
4fbp:AC
4fbp:AD
4fbp:CD
4htc:HI
4rub:AB
4rub:BC
4rub:BD
4rub:BV
4rub:CV
4rub:ST
4sbv:AB
4ts1:AB
5cna:BD
5cna:CD
5csc:AB
5rub:AB
6q21:AB
7aat:AB
7ins:AG
7ins:BG
7ins:DG
7tim:AB
8atc:AB
8cat:AB
8fab:CD
8rsa:AB
9gpb:BD
9ldt:AB
9rub:AB
9wga:AB
Burial of atoms
All the surface areas introduced in this paper are the
solvent-accessible surface area (ASA). They were calculated
using the program ACCESS (Hubbard, 1992), which is an
implementation of the Lee and Richards algorithm (Lee and
Richards, 1971). The solvent probe size is 1.4 Å. We used
the default van der Waals radii in the ACCESS program.
In the surface area calculations, all the ordered water
molecules in the PDB structures were ignored. The solvent
accessibility of a residue is evaluated by the ratio between
the summed atomic accessible surface areas of that residue
in the protein and the same residue (X) type in an extended
Ala–X–Ala tripeptide (Hubbard et al., 1991).
Calculation of expected values
Given the observed occurrence of pairs of types Ai (i 5
1,2,....p) and Bj (j 5 1,2,...,q), the expected value of the
corresponding pairs can be calculated by mixing Ai and Bj
randomly. Assume the number of observed pairs of types
Ai and Bj is n(Ai, Bj). First, we can evaluate the total
number of Ai, i.e.
q
n(Ai) 5
Σ n(A , B ),
i
j
(1)
j51
and the total number of Bj, i.e.
p
n(Bj) 5
Σ n(A , B ).
i
i51
j
(2)
1001
D.Xu, C.-J.Tsai and R.Nussinov
Table II. Interfaces of high-resolution structures
Interface
Resolution (Å)
S (Å2)
hb
sb
nw
Interface
Resolution (Å)
S (Å2)
hb
sb
nw
1aal:AB
1aap:AB
1bab:AC
1bab:AD
1bbb:AB
1bbb:AC
1cho:EI
1cmb:AB
1cpc:AK
1cse:EI
1dhj:AB
1gd1:PQ
1gd1:PR
1gma:AB
1gmq:AB
1hvi:AB
1isu:AB
1lmb:34
1lys:AB
1mol:AB
1nco:AB
1nsc:AB
1srn:AB
1tgx:AB
1trz:BD
1tta:AB
1vfa:AB
1.6
1.5
1.5
1.5
1.7
1.7
1.8
1.8
1.66
1.2
1.8
1.8
1.8
0.86
1.8
1.8
1.5
1.8
1.72
1.7
1.8
1.7
1.8
1.55
1.6
1.7
1.8
613.9
786.5
636.4
1331.9
1649.8
693.5
1466.1
3532.6
2007.3
1303.7
825.7
2602.3
729.2
1691.0
393.6
2859.6
527.9
1394.8
528.4
1010.8
697.9
3652.4
1599.2
986.5
1007.0
1584.5
1582.1
2
8
8
7
9
3
10
14
16
16
4
12
5
30
2
24
0
6
1
3
5
19
13
1
7
16
6
0
0
1
1
1
0
0
1
4
0
0
0
1
2
0
4
0
2
0
2
0
3
2
0
0
0
1
4
7
0
9
5
1
6
4
9
8
5
126
20
0
2
1
2
0
5
3
6
16
5
24
3
10
1
1vfb:AC
1vfb:BC
256b:AB
2aza:AB
2bbk:HJ
2bbk:HL
2bbk:LJ
2ccy:AB
2cga:AB
2gst:AB
2hmz:CD
2ltn:AC
2ltn:CD
2msb:AB
2ohx:AB
2spc:AB
2trx:AB
2utg:AB
3ins:AB
3mds:AB
3sdh:AB
3sgb:EI
4cha:AB
5rub:AB
8fab:CD
8rsa:AB
9wga:AB
1.8
1.8
1.4
1.8
1.75
1.75
1.75
1.67
1.8
1.8
1.66
1.7
1.7
1.7
1.8
1.8
1.68
1.64
1.5
1.8
1.4
1.8
1.68
1.7
1.8
1.8
1.8
627.4
598.1
632.4
938.6
1390.4
3094.1
1752.9
1644.6
1133.6
2806.6
1910.3
1386.8
6110.3
1291.7
3168.5
5209.9
570.7
3037.9
1602.1
1841.4
1960.3
1095.2
2050.0
5205.9
3291.8
1582.6
211.8
3
6
5
0
8
19
12
3
0
11
12
14
74
5
18
16
1
4
9
10
11
8
10
30
11
9
2
1
0
0
0
2
4
1
0
0
4
10
1
1
0
0
12
0
0
1
2
4
0
0
4
2
4
1
3
5
3
71
192
27
27
4
9
17
45
7
17
4
147
5
6
6
0
68
13
2
6
16
15
4
69
Interfaces whose PDB structures have a resolution of ø1.8 Å. Resolution is the resolution of the PDB structure; S is the total buried ASA in the interface; hb
is the number of hydrogen bonds between the two chains; sb is the number of salt bridges across the interface and nw is the number of water molecules
which form hydrogen bonds with both chains across the interface.
Hydrogen bonds
There are 3442 hydrogen bonds across the 319 protein interfaces of our dataset. In the following, we analyze their
distribution, composition and geometry. In some cases, we
compare their properties with those of 98599 inter-chain
hydrogen bonds found in the 550 chains which compose the
interfaces in Table I.
Fig. 1. Geometrical variables of a hydrogen bond: d 5 distance between
donor and acceptor; r 5 distance between acceptor and hydrogen; θ 5
angle of donor–hydrogen–acceptor; φ 5 angle of donor–acceptor–acceptor
antecedent; γ 5 angle of hydrogen–acceptor–acceptor antecedent.
The total number of all pairs is
p
ntotal 5
q
Σ Σ n(A , B ).
i
j
(3)
i51 j51
Hence, the expected number of pairs of types Ai and Bj is
n(Ai) n(Bj) ntotal
.
ne(Ai, Bj) 5
p
Σ i 51 Σqj51 n(Ai) n(Bj)
(4)
Results
In this section we present the results of a statistical analysis
of hydrogen bonds and salt bridges. Unless stated otherwise,
the statistics were carried out on the 319 protein interfaces as
listed in Table I described in the Methods section. In some
cases where the results are sensitive to the resolution of the
structures, we employed the 54 interfaces with high-resolution
structures in Table II.
1002
Number of hydrogen bonds across interface. The average
number of hydrogen bonds per interface is 10.69 for the
interfaces listed in Table I. Similarly, there are 10.26 hydrogen
bonds per interface for the high-resolution structures in Table II.
This is in agreement with an earlier statistical analysis based
on a much smaller dataset (15 interfaces), where 8–13 (an
average of 10) hydrogen bonds were found per interface (Janin
and Chothia, 1990). The standard deviation for the number of
hydrogen bonds per interface in our study is 12.35, which is
very large. This is also reflected in the distribution as shown
in Figure 2a. The number of hydrogen bonds is strongly
correlated with the total buried accessible surface area (ASA)
of the interface, which a correlation coefficient of 0.89. The
number of hydrogen bonds, n, and the buried ASA of an
interface, s, can be matched by a linear relationship, i.e.
n 5 5.34s 3 10–3 Å–2.
(5)
Figure 3 shows the n–s relationship and the fitting line of
Equation 5. The strong correlation between n and s also
illustrates a relatively narrow distribution of hydrogen bond
density across the protein interfaces, as shown in Figure 2b.
The average hydrogen bond density is 4.74 3 10–3/Å2 (close
to the fitting coefficient in Equation 5) with a standard deviation
H bonds and salt bridges across protein–protein interfaces
Fig. 2. Distribution of hydrogen bonds. (a) Number of protein interfaces versus number of hydrogen bonds in each protein interface; (b) number of protein
interfaces versus hydrogen bond density, i.e. number of hydrogen bonds per 100 Å2 of buried ASA on the interface.
Table III. Composition of hydrogen bonds per interface
Variable
O–O
N–N
O–N
Mean (Å)
Standard deviation (Å)
Percentage
Expected percentage
1.63
2.58
15.2
32.6
0.08
0.40
0.7
18.3
8.93
10.43
83.5
48.6
The rest of the hydrogen bonds are associated with sulfur atoms.
Table IV. Interfacial hydrogen bonds associated with main chains and side
chains
Fig. 3. Relationship between the number of hydrogen bonds, n, and the
buried ASA of an interface, s. The scattered dots represent the data of the
interface; the line is the linear fit of n–s, as shown in Equation 5.
of 2.88 3 10–3/Å2. This means that on average one hydrogen
bond is expected if 100 Å of ASA is buried on each side of
the protein interface.
There are 21 interfaces without any hydrogen bond. They
are 1ake:AB, 1bbr:KN, 1c2r:AB, 1fxi:AD, 1hpl:AB, 1isu:AB,
1ovo:CD, 1psp:AB, 1srd:AD, 1tcb:AB, 2aza:AB, 2cga:AB,
2mlt:AB, 2mta:HA, 2mta:LA, 2pcc:AC, 2rsl:AB, 4fbp:AD,
4rub:BV, 4rub:ST and 9gpb:BD. Most of them are interfaces
between two identical monomers in an asymmetric unit of
crystal packing which are not likely to occur in the solvent
(Janin and Rodier, 1995). The other interfaces are those
between subunits of quaternary structures. It is likely that all
rigid binding complexes involve hydrogen bonds across their
interfaces.
Composition of hydrogen bonds. The composition of the
hydrogen bonds for the types oxygen–oxygen (O–O), nitrogen–
nitrogen (N–N) and oxygen–nitrogen (O–N) is shown in
Table III. If oxygen and nitrogen atoms contact in a random
manner to form hydrogen bonds, the expected percentages for
the types O–O, N–N and O–N are 32.6, 18.3 and 48.6%,
respectively. The statistics we have obtained reflect a strongly
biased percentage. The hydrogen bonds across the interfaces
are predominantly the oxygen–nitrogen type. There are very
few hydrogen bonds between nitrogen atoms, because few
Variable
Main chain– Main chain– Side chain–
main chain
side chain
side chain
Mean (Å)
Standard deviation (Å)
Percentage
Expected percentage
2.42
4.94
22.6
16.7
3.78
4.97
35.4
47.2
4.20
5.35
39.3
33.4
The rest of the hydrogen bonds are associated with groups other than amino
acids
types of nitrogens in amino acids (only Nδ1 and Nε2 of
histidine) can serve as hydrogen bond acceptors.
The composition of hydrogen bonds associated with main
chains and side chains is shown in Table IV. The percentage
occurrences of main chain–main chain, main chain–side chain
and side chain–side chain hydrogen bonds are 64.8, 22.8 and
12.4%, respectively, within chains, but 22.6, 35.4 and 39.3%,
respectively, across interfaces. The significantly more intrachain hydrogen bonds of the main chain–main chain type is due
to the fact that protein interiors mostly consist of hydrophobic
residues which form well defined α-helices and β-sheets.
Although appreciably fewer hydrogen bonds are formed by
main chain atoms across the protein–protein interfaces, there
are substantially more main chain–main chain hydrogen bonds
(22.6%) than expected (16.7%). It is interesting to note that
the packing of some inter-protein backbones can mimic the
packing of intra-protein secondary structures. Figure 4 illustrates that the packing between subtilisin and the chymotrypsin
inhibitor forms β-sheets. In this case, the main chain–main
chain hydrogen bonds form a close compact structure similar
1003
D.Xu, C.-J.Tsai and R.Nussinov
Fig. 4. Stereoview of the backbone conformation across the interface of the complex 2sni (with a resolution of 2.1 Å). The receptor subtilisin (chain E in
2sni) is in pink; the chymotrypsin inhibitor (chain I in 2sni) is shown in a combination of green (for carbon atoms), red (for oxygen atoms) and blue (for
nitrogen atoms). The dashed lines represent the hydrogen bonds. This picture and Figures 12, 14 and 16 were generated by the program QUANTA (Molecular
Simulations, 1994).
Fig. 5. Distribution [P(d)] of the distance between a donor and an acceptor,
d, for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen
bonds. The interval of sample points is 0.05 Å in this figure and Figure 6.
Fig. 6. Distribution [P(r)] of the distance between hydrogen and acceptor, r,
for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen
bonds.
to the case in monomers. Charged groups are heavily involved
in the side chain–side chain packing across the interfaces.
About half (47.2%) of side chain–side chain hydrogen bonds
are salt bridges.
Geometry of hydrogen bonds. Figure 5 shows the distribution
of the distance between a donor and an acceptor for the interchain and intra-chain hydrogen bonds. The average distance
between a donor and an acceptor across the protein interface
is 2.92 Å, with a standard deviation of 0.24 Å. The maximum
distance is 3.76 Å and the minimum distance is 1.83 Å. The
average distance for the inter-chain oxygen–nitrogen type is
2.93 Å, with a standard deviation of 0.21 Å. The average
value is very close to the N–O distance observed by neutron
diffraction in the crystal structure of amino acids, i.e. from
2.872 to 2.895 Å (Jeffrey and Saenger, 1991). However, the
distribution in our statistics is much wider than this range.
The distribution of intra-chain hydrogen bonds is similar to
that of the inter-chain hydrogen bonds, with an average distance
of 2.95 Å and a standard deviation of 0.21 Å.
The distribution of the hydrogen–acceptor distances, calculated by HBPLUS, is shown in Figure 6 for both the interchain and intra-chain hydrogen bonds. The distribution of the
inter-chain hydrogen bonds is wide, with an average of 2.03
Å and a standard deviation of 0.24 Å. The hydrogen–acceptor
distance is one of the measurements of hydrogen bond strength:
the shorter the distance, the stronger is the hydrogen bond.
Only 1.14% of all the inter-chain hydrogen bonds have a
hydrogen–acceptor distance in the range 1.2–1.5 Å, which can
be considered as strong hydrogen bonds (Jeffrey and Saenger,
1991). Since normal and weak hydrogen bonding interactions
1004
H bonds and salt bridges across protein–protein interfaces
Fig. 7. Distribution [P(θ)] of the hydrogen bond angle (θ), i.e. the angle
between donor, hydrogen and acceptor, for the inter-chain (solid lines) and
intra-chain (broken lines) hydrogen bonds. The interval of the calculated
[θ, P(θ)] points is 2.0° in this figure and in Figures 9 and 10.
can be treated by electrostatics (Jeffrey and Saenger, 1991), it
is a good approximation that all the inter-chain hydrogen
bonds are modeled by the electrostatic interactions between
their partial charges. Again, the intra-chain hydrogen bonds
have a similar distribution, with an average distance of 2.06
Å and a standard deviation of 0.22 Å.
The wide diversity of hydrogen bonds in terms of geometry
is also reflected in the wide distribution of the bond angle
between donor, hydrogen and acceptor, as demonstrated in
Figure 7. The average bond angle of inter-chain hydrogen
bonds is 150.7°, with a standard deviation of 17.1°. The intrachain hydrogen bonds have an average bond angle of 151.5°,
with a standard deviation of 16.3°. Hence inter-chain hydrogen
bonds are slightly more off-linear and have a slightly wider
distribution than the intra-chain ones. The bond angle is
another assessment of the hydrogen bond strength: the closer
the bond angle to 180°, the more significant is the electrostatic
contribution and the stronger is the hydrogen bond. Very strong
hydrogen bonds have bond angles around 180°, while normal
or weak bonds have angles in the range 160 6 20° (Jeffrey
and Saenger, 1991). Therefore, the majority of the bonds are
normal or weak in terms of energetics, in accordance with the
conclusion drawn from the hydrogen–acceptor distance.
The distribution of inter-chain hydrogen bond strength can
be viewed clearly in the plot of bond angle versus the distance
between hydrogen and acceptor, as shown in Figure 8. Most
of the interfacial hydrogen bonds have a normal strength, with
a distance between hydrogen bond and acceptor of ~2 Å and
a bond angle of ~160°. There are not many weak bonds at the
lower right part of the figure, and there are few very strong
hydrogen bonds in the upper left corner.
There are significant differences between the intra-chain
and the inter-chain hydrogen bonds in the distribution of the
angle between hydrogen, acceptor and acceptor antecedent (γ),
as shown in Figure 9a. Although the distribution of interfacial
hydrogen bonds has a larger fluctuation than that of the intrachain hydrogen bonds owing to the smaller samples, they are
clearly much more off-linear and have a significantly wider
distribution. Such differences are also revealed in the distribution of the angle between donor, acceptor and acceptor antecedent (φ), as shown in Figure 10a. The less linearity of the γ
and φ angles in the interfacial hydrogen bonds compared with
the intra-monomer ones results in weaker dipolar interactions
Fig. 8. Hydrogen bond angle versus the distance between hydrogen and
acceptor for the 3442 inter-chain hydrogen bonds across protein interfaces.
between the hydrogen–donor and acceptor–acceptor antecedent. Hence the inter-chain hydrogen bonds are generally
weaker and have a larger diversity than the intra-chain ones.
To understand the origin of the above differences, we
calculated the distribution of γ and φ for the main chain–main
chain hydrogen bonds and for the main chain–side chain and
side chain–side chain hydrogen bonds separately, as shown in
Figures 9b and 10b and in Table V. Compared with the overall
distribution, the main chain–main chain hydrogen bonds have
a similar distribution across the protein interface and within
the same chain. The similarity between the inter-chain and the
intra-chain hydrogen bonds is also observed in the distribution
of main chain–side chain and side chain–side chain types.
However, the main chain–main chain hydrogen bonds are
more linear and have a narrower distribution, i.e. have stronger
interactions than the main chain–side chain and side chain–
side chain types. Hence the difference between inter-chain and
intra-chain hydrogen bonds in the distribution of γ and φ
mainly arises from their different percentage occurrences in
the main chain–main chain type (22.6% in the protein interface
vs 64.8% in the same chain). Since the main chain–main chain
hydrogen bonds originate largely from secondary structure
elements, that is entirely understandable. In particular, a large
proportion of these in the monomers are α-helices.
We also compared the distribution of γ and φ between
interfaces with large buried ASA and those with small ones.
No significant difference was found between these two groups.
The interfacial hydrogen bonds with a total buried ASA of
ù2500 Å have an average γ of 131.9°, with a standard
deviation of 19.0°, while others have an average γ of 132.9°,
with a standard deviation of 19.4°. Correspondingly, φ is
133.3 6 19.3° for interfaces with large buried ASA and
134.5 6 19.2° for others. Interfaces with small buried surface
areas are likely to represent rigid binding, whereas those with
large ones tend to reflect a change in the conformation of the
associating proteins upon binding. Nevertheless, the buried
ASA is not a clear-cut criterion for distinguishing between
rigid and flexible associations. For example, the flexible
protein–short peptide binding also has small buried ASA.
Hence our data are insufficient to differentiate between the
hydrogen bonds in rigid versus flexible binding.
Burial of hydrogen bonds. Figure 11a indicates that most
interfacial donors/acceptors which form hydrogen bonds either
with the same or with a different chain are highly buried. (An
1005
D.Xu, C.-J.Tsai and R.Nussinov
Fig. 9. Distribution [P(γ)] of the angle of hydrogen–acceptor–acceptor antecedent (γ) for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen
bonds. (a) All the hydrogen bonds; (b) the main chain–main chain hydrogen bonds (thick lines) vs the main chain–side chain and side chain–side chain ones
(thin lines).
Fig. 10. Distribution [P(φ)] of the angle of donor–acceptor–acceptor antecedent (φ) for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen
bonds. (a) All the hydrogen bonds; (b) the main chain–main chain–main chain (thick lines) vs the main chain–side chain and side chain–side chain (thin
lines) hydrogen bonds.
Table V. Average standard deviations of γ and φ
Angle
Type
All
Main chain–
main chain
Main chain–
side chain and side
chain–side chain
γ (°)
Inter-chain
Intra-chain
Inter-chain
Intra-chain
129.3 6 27.4
136.4 6 19.9
129.9 6 29.3
140.1 6 20.1
145.4 6 17.0
140.9 6 19.3
148.9 6 14.9
146.2 6 18.0
127.3 6 21.2
128.3 6 18.5
129.1 6 18.1
128.9 6 18.8
φ (°)
interfacial atom is defined as one whose ASA changes by
.0.1 Å2 between the unbound chain and the complex.) The
more an interfacial donor or acceptor atom is buried, the more
likely it is to form a hydrogen bond with a protein atom (see
Figure 11b); 5153 out of 6727 (76.6%) fully buried oxygen/
nitrogen atoms form hydrogen bonds in comparison with 797
out of 965 (82.6%) for the high-resolution structures in Table II.
The discrepancy in the percentages between the dataset of the
319 protein interfaces and that with higher resolution structures
indicates that some actual hydrogen bonds across protein
interfaces are excluded owing to errors in atomic coordinates
in low-resolution structures. The hydrogen donors or acceptors
with non-zero accessibility to solvent and no hydrogen bonding
to protein atoms are most likely to form hydrogen bonds with
the solvent.
1006
Specificity of hydrogen bonds. We observed a conservation in
the pattern of some interfacial hydrogen bonds. An example
is shown in the comparison between 1cho (α-chymotrypsin
complexed with the turkey ovomucoid third domain) and
1acb (α-chymotrypsin complexed with eglin C). Figure 12a
demonstrates that the backbone conformations in the binding
regions of the ovomucoid third domain and eglin C are
remarkably similar, although the two inhibitors differ in their
global structures. The inter-protein hydrogen bonds are also
highly conserved, as shown in Figure 12b.
Role of water. Table VI shows the occurrence of water in the
hydrogen bonding network across the interfaces of the highresolution structures in Table II; 1070 water molecules are
involved in bridging 4061 polar atom pairs across the interface
by hydrogen bonds. On average, each water molecule connects
3.8 cross-chain atom pairs. Some water molecules have a
potential to form hydrogen bonds with more than four protein
donors/acceptors. These water molecules are most likely to
allocate their hydrogen bonds to the protein atoms dynamically,
in the so-called ‘flip-flop’ mechanism (Betzel et al., 1984;
Meyer, 1992). There are 19.8 waters per interface. However,
as shown in Table II, most interfaces have ,10 waters. There
are some interfaces which have .30 waters, namely 1gd1:PQ,
2aza:AB, 2bbk:HJ, 2hmz:CD, 2ohx:AB, 3mds:AB and
9wga:AB, all of which are interfaces between two monomers
of the same protein in an asymmetric unit. The complexes
H bonds and salt bridges across protein–protein interfaces
Fig. 11. (a) Number of interfacial donors/acceptors which form hydrogen bonds versus solvent accessible surface area; (b) percentage of fulfilled donors/
acceptors hydrogen bonding to protein atoms at certain ASA.
Table VI. Inter-chain hydrogen bond bridges mediated by water
Type
Occurrence
Expected
Donor–H2O–donor
Acceptor–H2O–acceptor
Donor–H2O–acceptor
617
1756
1688
525.6
1664.6
1870.8
Table VII. Salt bridge distribution
Residue
Arg
Lys
His
N-terminal
Asp
Glu
C-terminal
179 (159.4)
148 (162.7)
12 (16.3)
100 (118.0)
138 (120.5)
13 (12.1)
11 (10.8)
11 (11.0)
1 (1.1)
4 (4.7)
2 (4.8)
4 (0.5)
There are examples where close pairs of like charges are
observed across protein interfaces, as shown in Figure 14.
Burial of salt bridges. Figure 15 compares the solvent accessibility of the polar atoms on the side chains of salt bridges
across the interfaces with that within the chains. Most interchain salt bridges are highly buried, whereas intra-chain ones
are generally much less buried, indicating that the environment
of the interfacial charges is different from that of the charges
in monomers. In the case of protein complexes formed by
rigid binding, if both proteins were allowed to change their
conformations freely, the system could form a structure whose
environment of the interfacial charges is similar to that of the
charges in monomers. This further indicates that, unlike the
case of folded monomers in rigid binding, protein complexes
may be far away from the global minimum conformation.
Numbers in parentheses are the expected values for each type.
of these monomers are most likely to be enforced by the
crystallization, and may not be stable in the solvent. The
observed occurrences of the donor–H2O–donor and acceptor–
H2O–acceptor pairs are more frequent than their corresponding
expected values, indicating that bound interfacial waters tend
to mediate pairs of polar groups which cannot form hydrogen
bonds directly with each other.
Salt bridges
There are 623 salt bridges across the 319 protein interfaces.
On average, there are about two salt bridges per interface. The
number of salt bridges of each high-resolution structure is
listed in Table II.
Salt bridge distributions. The distribution of different types of
salt bridges is shown in Table VII. The count of each salt
bridge type is very close to its expected value, indicating a
lack of discrimination between specific salt bridge donors and
acceptors when they form salt links.
Charge distributions. Figure 13 shows the distribution of
opposite charge pairs and like charge pairs across the interfaces.
There are 18 058 like charge pairs within a cut-off of 20 Å.
There are 19 172 opposite charge pairs within the same cutoff. Comparison of the like charge with the opposite charge
pair distribution shows that the opposite charge pairs have a
strong peak of P(r)/r2 at 2.75 Å (see Figure 13), suggesting
that salt bridges across protein interfaces are highly favorable.
However, charge complementarity does not always occur.
Discussion
Our statistical analysis is based on the high-quality X-ray
structures of proteins. It allows us to discuss the role of
hydrogen bonds, salt bridges and bound water molecules in
protein–protein associations. It further enables us to address
the similarities and the differences between protein–protein
binding and protein folding.
Similarities and differences between interfaces and
monomers
Our study illustrates both the similarities and the differences
between the interfacial and the intra-chain hydrogen bonds.
The distribution of the distances between the donor and
acceptor and between the hydrogen and the acceptor in the
interfaces and in the monomers is similar. This suggests that
the overall packing between two chains is similar to the packing
observed within monomers. In both cases, the hydrogen bonds
are generally not very strong, with a wide distribution in their
geometries. However, inspection of the angles between donor,
hydrogen and acceptor, and in particular of the angles between
hydrogen, acceptor and acceptor antecedent, and of those
between donor, acceptor and acceptor antecedent, reveals that
the interfacial hydrogen bonds are less optimal than the
intra-chain ones, with their geometries demonstrating wider
distributions. Although there is no other experimental evidence
available, the conclusion is justified since each angle was
calculated from the atomic coordinates of an experimental
protein structure.
1007
D.Xu, C.-J.Tsai and R.Nussinov
Fig. 12. (a) Stereoview of the superimposed structures for 1cho (with a resolution of 1.8 Å) and 1acb (with a resolution of 2.0 Å). The α-chymotrypsin
portions of both complexes are matched at the Cα positions. (b) Stereoview of the superimposed residues at the binding sites shown in (a). The continuous
peptides, from left to right, are Val343, Thr344, Leu345, Asp346 and Leu347 of 1cho, and Cys316, Thr317, Leu318, Glu319 and Tyr320 of 1acb. The
surrounding residues of α-chymotrypsin, from left to right, are Gly216, Ser214, Ser195, Gly193 and Phe41, respectively. The small balls show the bound
water molecules. The dashed lines represent the hydrogen bonds. In both (a) and (b), the purple and orange represent α-chymotrypsin for 1cho and 1acb,
respectively; the blue shows the turkey ovomucoid third domain in 1cho; the red shows eglin C in 1acb.
Fig. 13. (a) Distribution of the distance between opposite charges (solid line) and like charges (dashed lines), P(r); (b) P(r) normalized by r2. The interval for
calculating each point is 0.5 Å.
Our study suggests that the different quality of hydrogen
bonds across protein interfaces compared with those within
the same chains arises from the larger number of main chain–
side chain and side chain–side chain hydrogen bonds which
are involved in binding than in folding. The chemical bonds
linking atoms in proteins limit their arrangements in both
1008
folding and binding, and prevent polar groups from forming
high-quality hydrogen bonds like those observed in amino acid
crystals. This explains why the distribution is much wider in
both interfacial and intra-chain hydrogen bonds than in the
hydrogen bonds in amino acid crystals. On the other hand,
there is a difference in the constraints between main chain–
H bonds and salt bridges across protein–protein interfaces
Fig. 14. Examples of like charge pairs. (a) 2pcb (with a resolution of 2.8 Å): yeast cytochrome c peroxidase complex with horse heart cytochrome c; (b) 3sc2
(with a resolution of 2.2 Å): the A–B chains of serine carboxypeptidase II; (c) 1hvi (with a resolution of 1.8 Å): A–B chains of HIV-1 protease complexed
with the inhibitor A77003; (d) 3rp2 (with a resolution of 1.9 Å): A–B chains of rat mast cell protease.
Fig. 15. Distribution of solvent accessibility of the polar atoms on the side chains for salt bridges across the interfaces (a) and within the same chains (b).
main chain compared with main chain–side chain and side
chain–side chain hydrogen bonds. The main chain–main chain
hydrogen bonds can form optimal configurations collectively,
such as α-helices and β-sheets. However, a hydrophilic side
chain often has several polar atoms and the dipole moments
of donor–hydrogen and/or of acceptor–acceptor antecedent
cannot be aligned optimally in hydrogen bonds simultaneously,
as shown in Figure 16. The side chain movements upon binding
accommodate hydrogen bonds between an antibody and a
lysozyme, in a similar manner to that occurring in forming a
salt bridge (Norel et al., 1997). To form the two hydrogen
bonds, Gln121 of lysozyme shifts its position and changes its
conformation during the binding. However, the alignment of
the direction along acceptor antecedent–acceptor is restricted
by the bond lengths and bond angles in the amino acids. The
movements of the side chain atoms drive them off-equilibrium
from the relaxed state. The more the hydrophilic side chains
are buried and packed together, the more frustration they
experience in their alignment to reach the free energy
global minimum state. Such frustration in the packing of the
1009
D.Xu, C.-J.Tsai and R.Nussinov
Fig. 16. Stereoview of the side chain movement during the formation of hydrogen bonds in binding. The bound structure (1vfb with a resolution of 1.8 Å) is
the FV fragment of mouse monoclonal antibody D1.3 (chain A) complexed with the hen egg lysozyme (chain C). It is shown in a combination of green (for
carbon atoms), red (for oxygen atoms) and blue (for nitrogen atoms). Gln121 in the unbound state (5lym, also with a resolution of 1.8 Å) is in pink. The
dashed lines show the hydrogen bonds. The dihedral angle Cα–Cβ–Cγ–Cδ of Gln121 is also changed from 172.5° in the unbound state to –167.9° in the bound
state. The unbound lysozime (5lym) is matched with the bound lysozyme (chain C of 1vfb) at the Cα positions.
polar/charged side chains is similar to the case of the spinglass state (Wolynes, 1990), where spins are trapped in the
metastable glass state, rather than being in the global minimum.
In monomeric protein folding, such a problem is more likely
to be solved by excluding hydrophilic side chains to the
surface, where a polar/charged atom can easily form a highquality hydrogen bond with the solvent owing to the flexibility
of water molecules. However, in the case of rigid protein
binding, as shown in Figure 15, hydrophilic side chains are
more likely to be buried in the interfaces to form sub-optimal
main chain–side chain and side chain–side chain hydrogen
bonds. Hence bound complexes are generally more off-minima
than monomers.
The difference between folding and binding is also observed
in the participation of water in the hydrogen bonding network.
There are significantly more buried water molecules in the
protein interface than in the interior of the protein. Water can
mediate between two hydrogen bond donors, or acceptors,
across the interface. A water molecule between a donor and
an acceptor across the interface can usually form good hydrogen
bonds with both atoms owing to its small size and flexibility.
If the water is removed, the donor and acceptor may not form
a hydrogen bond or only form a poor one owing to the
constraints imposed by both proteins during their binding.
Proteins compete with water molecules in binding (Ringe,
1995). The generally weaker inter-chain hydrogen bonds do
not compete with the binding of water as efficiently as the
intra-chain ones. On the other hand, in the monomer interior
the buried hydrogen bonds are predominantly of the main
chain–main chain type, which are strong and hence compete
favorably with hydrogen bonding to water. In addition, the
monomer interior, which is more hydrophobic, is typically
unfavorable for buried waters, while a more ‘friendly’ hydrophilic environment in the protein interface can easily form
hydrogen bonds with buried water. These are likely to constitute
the main reasons why more water molecules are buried in the
protein interface than in the interior of the protein.
The differences between folding and binding, reflected in
the quality of the hydrogen bonds, further support the notion
1010
that protein complexes often do not reach the global
energy minima (Lin et al., 1995; Xu et al., 1997). The large
number of hydrophilic side chains buried across the interfaces
are footprints of rigid-body binding. These, in turn, serve as
a clear mark of a metastable state, manifested both in more
hydrogen bonds involved in side chains and in more bound
water molecules buried in the interface. If two complexed
proteins were allowed to undergo a conformational change,
optimizing their bound structure freely, such footprints could
disappear with the complex reaching its hypothetical
global minimum state. However, the high kinetic barrier
prevents such an occurrence from happening (Xu et al., 1997).
How close a bound complex is to the presumed
global minimum state probably varies from case to case, owing
to the diversity of biological systems. Some protein interfaces
are dominated by the main chain–main chain hydrogen bonds.
For example, in the complex between subtilisin and chymotrypsin inhibitor, as shown in Figure 4, eight out of 10 interfacial
hydrogen bonds are of the main chain–main chain type. The
complex may be close to its global minimum. On the other
hand, many other interfaces have few main chain–main chain
hydrogen bonds. In the antibody–antigen complex 3hfl (FAB
fragment HyHEL-5 complexed with lysozyme), only one out
of 11 interfacial hydrogen bonds is of the main chain–main
chain type, with many bound water molecules buried at the
interface. In this case, the bound state is expected to be far
away from its global minimum state. This may originate from
the function of the antibody. The small hypervariable regions
need to be able to bind an immense variety of antigens
within the same structural framework (Creighton, 1993). More
hydrophilic side chains are likely to be involved in the binding
interface to be available for mutations that can adapt the
hydrogen bonding patterns to different antigens. This restriction
in the allowed conformations imposes further constraints, in
addition to the rigid body binding. Hence for the complex it
is even more difficult to attain its global minimum state.
Electrostatic complementarity and binding specificity
Electrostatic complementarity across the binding interface is
revealed in the formation of hydrogen bonds and salt bridges.
H bonds and salt bridges across protein–protein interfaces
Most highly buried donors and acceptors form hydrogen bonds.
The charge distribution indicates that opposite charge pairs
are substantially more favorable than like charges. In some
cases a minor perturbation of the hydrogen bond/charge
network across the binding interface may substantially destabilize the complex (Chacko et al., 1995). Backbone–backbone
hydrogen bonds play an important role in protein–protein
interactions. Protein backbones of different chains can associate
to form complementary β-sheets across the bound interface.
This is in agreement with a recent statistical analysis by Vakser
(1996), showing that the complementarity between backbones
may facilitate the initial phase of the binding.
Binding specificity, defined by electrostatic complementarity,
is clearly revealed in the conserved pattern of hydrogen
bonding network as illustrated in Figure 12. A requirement for
a conserved hydrogen bonding network has also been observed
in several enzyme–ligand complexes The selectivity of the
binding of the protein kinase family is achieved through
preserved hydrogen bonds (Xu et al., 1996). We have recently
predicted the binding between the yeast chorismate mutase
and a transition state analog, and compared it with the binding
between a bacterial chorismate mutase and the same ligand
(Lin et al., 1997). Our study shows that the binding function
is conserved via a common mechanism with common salt
bridges and hydrogen bonds, rather than via a conserved
sequence or global structure.
Nevertheless, the atom-based complementarity is not an
absolute requirement. There are some fully buried hydrogen
bond donors/acceptors which do not form any hydrogen bonds.
There are also some like charge pairs. In the case of the HIV
protease (1hvi in Figure 14c), it is known that only one of the
carboxyls of Asp25(A) and Asp25(B) is ionized, although the
position of the proton has not been determined experimentally
(Creighton, 1993). A highly unfavorable like charge pair can
shift the pKa of the ionizable groups so that either one or both
ions are neutral. This is likely the case for 3sc2 shown in
Figure 14b also. The like charge pairs may also form triads
with their opposite charges, to obtain partial electrostatic
compensation, as depicted in Figure 14a and d. If a like charge
pair is solvent accessible, it may attract counter ions in the
solvent to balance its charges.
Implications to the docking problem
Our results are expected to aid in identifying potential binding
sites and in scoring binding modes predicted by geometrically
based methods. Geometry-based docking approaches generally
yield a large number of potential ligand binding conformations
(Norel et al., 1994, 1995; Fischer et al., 1995). Current scoring
schemes often fail to predict the correct binding modes
(Lybrand, 1995). Hydrogen bonds and salt bridges, as contributors to strong physical interactions, comprise an important
component in the assessment of binding (Meyer et al., 1996).
By using statistically derived data on hydrogen bonds and salt
bridges across protein interfaces, one may develop a scoring
system with a strong chemical relevance to identify the binding
modes. This approach can be particularly useful in the flexible
docking problem, where receptors and/or ligands may change
their shape upon binding, making it extremely difficult to
predict correct binding conformations utilizing geometrically
based methods. Since the surface patterns of receptors and
ligands, in terms of atom composition, change very little
upon binding, it is possible to use complementarity between
hydrogen bonding donors and acceptors, and between opposite
charges, and their relationship to the percentage burial in the
formed interface, to identify bound from unbound configurations. Conserved patterns of hydrogen bonds may also be
utilized. Hence the structure of a complex between a receptor
and a ligand may be used as a template to predict the binding
mode between another ligand and the same receptor if the two
inhibitors are known to bind at the same region of the receptor.
The distribution of the distances between the hydrogen bond
donors and acceptors also sheds some light on the scoring of
the docked predictions. The van der Waals radii for both
nitrogen and oxygen atoms in the Charmm parameter set are
1.6 Å. Since the average distance between a donor and an
acceptor is 2.92 Å, with a standard deviation of 0.24 Å, most
hydrogen bonds have a distance smaller than the sum of the
van der Waals radii of the donor and acceptor. This is
understandable owing to the attractive interaction between the
donor and acceptor in a hydrogen bond. However, such a
strong penetration may affect the quality of the docked
predictions. Typically, docking programs penalize any van der
Waals penetrations (Norel et al., 1994, 1995; Fischer et al.,
1995). Our study indicates that the scoring should not penalize
a reasonable penetration between hydrogen bond donors and
acceptors. This is particularly important when the ligand is
small and very sensitive to the surface complementarity.
Another practical problem in docking is buried water molecules across the binding interface. Our study shows that there
are many bound waters across protein–protein interfaces. Water
can also mediate protein–small ligand binding (Fauman et al.,
1994). However, current docking methods do not include
water. This may decrease the docking performance in some
cases, notably in the antibody–antigen binding (Fischer et al.,
1995; Meyer et al., 1996), where interfacial water molecules
are present more extensively than in most other complexes.
Our study on the statistical analysis of the interfacial water,
and further work along these lines, may help to locate water
molecules in binding and enhance docking predictions.
Conclusion
Although the types of interactions at protein–protein interfaces
resemble those observed in the interior of protein monomers,
there are also some inherent differences. In both cases the
hydrophobic effect plays an important role. Nevertheless, the
extent of the hydrophobic effect in the interior of protein
monomers is significantly larger than that observed at protein–
protein interfaces (Tsai et al., 1997). On the other hand, while
salt bridges destabilize protein cores, they may contribute to
stabilize protein associations. Inspection of the types of residues
at the interfaces, as compared with the monomers, indicates
that polar and charged residues constitute a larger percentage
than in monomers. Here, we examined the hydrogen bonds
and the salt bridges. To this end, we carried out an extensive
analysis of the hydrogen bonds and of the salt bridges in a
collection of 319 non-redundant protein–protein interfaces,
assembled previously from protein X-ray structures.
We found that, on average, there are 10.7 hydrogen bonds
and 2.0 salt bridges per interface. Charge complementarity is
found for both charges and hydrogen bonding donors/acceptors.
However, 17.4% of fully buried donors or acceptors in highresolution structures do not form any hydrogen bonds, and
some like charges are at a close distance. Polar atoms on the
backbone have a strong tendency to form hydrogen bonds
with backbone atoms across the interface, and some main
chain–main chain hydrogen bonds can form β-sheets.
1011
D.Xu, C.-J.Tsai and R.Nussinov
In particular, our results indicate that the quality of the
hydrogen bonds in the interfaces is not as good as that generally
observed within the chains. This is reflected both in the angular
distributions and in the significantly larger number of water
molecules mediating hydrogen bonds at the interfaces. The
lower quality of interfacial hydrogen bonds is attributed to the
large number or polar/charged side chains buried across protein
interfaces, which is enforced by the hydrophilic surface of the
monomers and the rigid body binding. Rigid body protein–
protein associations can be described by a three-state model,
where the unfolded chains first fold to their native, lowest
energy configurations. Subsequently, the folded chains associate, to form their bound complexes. Although relatively minor
conformational rearrangements occur, basically the already
folded chains have only three rotational and three translational
degrees of freedom to optimize their binding, leaving many
hydrophilic side chains and water molecules buried in the
interface. This is unlike the case of protein folding. Hence the
bound complex is more likely to exist in a metastable state,
rather than at the global minimum.
The results obtained in this study further enhance our
understanding of the similarities and of the differences between
folding and binding. They bear on the specificity of protein–
protein associations. As such, they are useful for inhibitor
design and for scoring multiple docked conformations predicted
by the geometrically based methods.
Here we have examined protein–protein binding, rather than
protein–small molecule interactions or protein–nucleic acid
associations. In the latter case, hydrogen bonds often conceivably play a dominant role, and hence their patterns can be
used more directly to aid in distinguishing native from nonnative binding orientations. Their patterns of interactions not
only dictate specific recognition, but also provide much of
the stability.
Acknowledgements
We thank Drs Jie Liang, Shuo L.Lin, Aijun Li, David Covell, Anders Wallqvist,
Saraswathi Vishveshwara and, in particular, Jacob Maizel, for helpful discussions. We thank the personnel at the Frederick Cancer Research and Development Center for their assistance. All the calculations presented in this paper
were carried out on Silicon Graphics workstations operated by the Frederick
Biomedical Supercomputing Center, National Cancer Institute. The research
of R.Nussinov was sponsored by the National Cancer Institute, DHHS, under
Contract No. 1-CO-74102 with SAIC, and in part by grant No. 95-00208
from the BSF, Israel, by a grant from the Israel Science Foundation administered
by the Israel Academy of Sciences, and by the Rekanati Fund. The content
of this publication does not necessarily reflect the views or policies of the
Department of Human Service, nor does mention of trade names, commercial
products or organizations imply endorsement by the US Government. By
acceptance of this paper, the publisher or recipient acknowledges the right of
the US Government to retain a non-exclusive, royalty-free license in and to
any copyright covering the paper.
References
Baker,E.N. and Hubbard,R.E. (1984) Prog. Biophys. Mol. Biol., 44, 97.
Barlow,D.J. and Thornton,J.M. (1983) J. Mol. Biol., 168, 867–885.
Barlow,D.J. and Thornton,J.M. (1986) Biopolymers, 25, 1717–1733.
Bartlett,P.A. and Marlowe,C.K. (1984) Trends Biochem. Sci., 9, 145–147.
Bennett,M.J., Choe,S. and Eisenberg,D. (1994) Proc. Natl Acad. Sci. USA,
91, 3127–3131.
Bennett,M.J., Schlunegger,M.P. and Eisenberg,D. (1995) Protein Sci., 4,
2455–2468.
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D.,
Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol.
Biol., 112, 535–542.
Betzel,C., Saenger,W., Hingerty,B.E. and Brown,G.M. (1984) J. Am. Chem.
Soc., 106, 7545–7557.
1012
Bhat,T.N. et al. (1994) Proc. Natl Acad. Sci. USA, 91, 1089–1093.
Chacko,S., Silverton,E., Kam-Morgan,L., Smith-Gill,S., Cohen,G. and
Davies,D. (1995) J. Mol. Biol., 245, 261–274.
Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet.,
11, 271–280.
Creighton,T.E. (1993) Proteins. 2nd edn. Freeman, San Francisco.
Fauman,E.B., Rutenber,E.E., Maley,G.F., Maley,F. and Stroud,R.M. (1994)
Biochemistry, 33, 1502–1511.
Fersht,A.R. (1984) Trends Biochem. Sci., 9, 145–147.
Finkelstein,A.V., Gutin,A.M. and Badretdinov,A.Y. (1995) Proteins: Struct.
Funct. Genet., 23, 151–162.
Fischer,D., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 248,
459–477.
Gandini,D., Gogioso,L., Bolognesi,M. and Bordo,D. (1996) Proteins: Struct.
Funct. Genet., 24, 439–449.
Gao,J., Mammen,M. and Whitesides,G.M. (1996) Science, 272, 535–537.
Helms,V. and Wade,R.C. (1995) Biophys. J., 69, 810–824.
Hendsch,Z.S., Jonsson,T., Sauer,R.T. and Tidor,B. (1996) Biochemistry, 35,
7621–7625.
Hendsch,Z.S. and Tidor,B. (1994) Protein Sci., 3, 211–226.
Honig,B. and Yang,A.S. (1995) Adv. Protein Chem., 46, 27–59.
Hubbard,S. (1992) ACCESS. EMBL.
Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) J. Mol. Biol., 220,
507–530.
Janin,J. and Chothia,C. (1990) J. Biol. Chem., 265, 16027–16030.
Janin,J. and Rodier,F. (1995) Proteins: Struct. Funct. Genet., 23, 580–587.
Jeffrey,G.A. and Saenger,W. (1991) Hydrogen Bonding in Biological Structure.
Springer, Berlin.
Korn,A.P. and Burneet,R.M. (1991) Proteins: Struct. Funct. Genet., 9, 37–55.
Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379–400.
Lin,S.L., Tsai,C.J. and Nussinov,R. (1995) J. Mol. Biol., 248, 151–161.
Lin,S.L., Xu,D., Li,A., Roiterst,M., Wolfson,H.J. and Nussinov,R. (1997)
Lybrand,T.P. (1995) Curr. Opin. Struct. Biol., 5, 224–228.
Marqusee,S. and Sauer,R.T. (1994) Protein Sci., 3, 2217–2225.
McDonald,I., Naylor,D., Jones,D. and Thornton,J. (1993) HBPLUS: Hydrogen
Bond Calculator Version 2.25. University College London, London.
McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777–793.
Meyer,E. (1992) Protein Sci., 1, 1543–1562.
Meyer,M., Wilson,P. and Schomburg,D. (1996) J. Mol. Biol., 264, 199–210.
Molecular Simulations (1994) QUANTA 4.0. Molecular Simulations,
Burlington, MA.
Musafia,B., Buchner,V. and Arad,D. (1995) J. Mol. Biol., 254, 761–770.
Myers,J.K. and Pace,C.N. (1996) Biophys. J., 71, 2033–2039.
Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1994) Biopolymers, 34,
933–940.
Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 252,
263–273.
Norel,R., Lin,S.L., Xu,D., Wolfson,H.J. and Nussinov,R. (1997) Submitted.
Novotny,J. and Sharp,K. (1992) Prog. Biophys. Mol. Biol., 58, 203–224.
Rashin,A. and Honig,B. (1984) J. Mol. Biol., 173, 515–521.
Ringe,D. (1995) Curr. Opin. Struct. Biol., 5, 825–829.
Shirley,B.A., Stanssens,P., Hahn,U. and Pace,C.N. (1992) Biochemistry, 31,
725–732.
Tissot,A.C., Vuilleumier,S. and Fersht,A.R. (1996) Biochemistry, 35, 6786–
6794.
Tsai,C.J. (1996) Protein–Protein Interface. Laboratory of Mathematical
Biology World Wide Web (WWW) page, hhtp://www-lmmb.ncifcrf.gov/
tsai/.
Tsai,C.J., Lin,S.L., Wolfson,H. and Nussinov,R. (1996) J. Mol. Biol., 260,
604–620.
Tsai,C.J., Lin,S.L., Wolfson,H. and Nussinov,R. (1997) Protein Sci., 6, 53–64.
Tsai,C.J. and Nussinov,R. (1997) Protein Sci., 6, 24–42.
Vakser,I.A. (1996) Protein Engng, 9, 741–744.
Warshel,A. and Russell,S.T. (1984) Q. Rev. Biophys., 17, 283–422.
Wilson,C., Mau,T., Weisgraber,K.H., Wardell,M.R., Mahley,R.W. and
Agard,D.A. (1994) Structure, 2, 713–718.
Wolynes,P.G. (1990) In Stein,D. (ed.), Spin Glasses and Biology. World
Scientific, Singapore.
Xu,D., Lin,S.L. and Nussinov,R. (1997) J. Mol. Biol., 265, 68–84.
Xu,R.M., Carmel,G., Kuret,J. and Cheng,X. (1996) Proc. Natl Acad. Sci.
USA, 93, 6308–6313.
Received January 22, 1997; revised March 28, 1997; accepted May 5, 1997