Protein Engineering vol.10 no.9 pp.999–1012, 1997 Hydrogen bonds and salt bridges across protein–protein interfaces Dong Xu1, Chung-Jung Tsai2 and Ruth Nussinov1,3,4 1Laboratory of Mathematical Biology, IRSP, SAIC Frederick, NCI–FCRDC, Frederick, MD 21702-1201, 2Laboratory of Mathematical Biology, NCI– FCRDC, PO Box B, Frederick, MD 21702-1201, USA and 3Sackler Institute of Molecular Medicine, Tel Aviv University, Tel Aviv 69978, Israel 4To whom correspondence should be addressed, at the first address To understand further, and to utilize, the interactions across protein–protein interfaces, we carried out an analysis of the hydrogen bonds and of the salt bridges in a collection of 319 non-redundant protein–protein interfaces derived from high-quality X-ray structures. We found that the geometry of the hydrogen bonds across protein interfaces is generally less optimal and has a wider distribution than typically observed within the chains. This difference originates from the more hydrophilic side chains buried in the binding interface than in the folded monomer interior. Protein folding differs from protein binding. Whereas in folding practically all degrees of freedom are available to the chain to attain its optimal configuration, this is not the case for rigid binding, where the protein molecules are already folded, with only six degrees of translational and rotational freedom available to the chains to achieve their most favorable bound configuration. These constraints enforce many polar/charged residues buried in the interface to form weak hydrogen bonds with protein atoms, rather than strongly hydrogen bonding to the solvent. Since interfacial hydrogen bonds are weaker than the intra-chain ones to compete with the binding of water, more water molecules are involved in bridging hydrogen bond networks across the protein interface than in the protein interior. Interfacial water molecules both mediate non-complementary donor–donor or acceptor–acceptor pairs, and connect non-optimally oriented donor–acceptor pairs. These differences between the interfacial hydrogen bonding patterns and the intra-chain ones further substantiate the notion that protein complexes formed by rigid binding may be far away from the global minimum conformations. Moreover, we summarize the pattern of charge complementarity and of the conservation of hydrogen bond network across binding interfaces. We further illustrate the utility of this study in understanding the specificity of protein–protein associations, and hence in docking prediction and molecular (inhibitor) design. Keywords: bound water/hydrogen bond/molecular recognition/ protein association/salt bridge/statistical analysis Introduction Hydrogen bonds and salt bridges play a central role in protein binding. The protein surface is studded with many hydrophilic residues. It has been reported that the average charge density of native proteins is 1.4 charged groups per 100 Å2 of protein surface (Barlow and Thornton, 1986). Hence the binding © Oxford University Press interface, which basically consists of protein surfaces, is generally more hydrophilic than the protein interior. In addition, as early studies have shown, protein active sites often provide electrostatic complementarity to the charge distribution of the binding substrates (Warshel and Russell, 1984; Cherfils et al., 1991; Novotny and Sharp, 1992; Creighton, 1993). Taken together, binding interfaces tend to form more hydrogen bonds and salt bridges than protein interiors. Aside from the different densities, the hydrogen bonds and salt bridges across the binding interfaces and in the protein interiors also differ in their relative contributions to the energetics. As shown in our earlier studies (Xu et al., 1997), electrostatic interactions play a more important role in binding than in folding. Hence interfacial hydrogen bonds and salt bridges, as the major contributors to the electrostatic interactions between proteins, play a more important role in binding than the intra-monomer hydrogen bonds and salt bridges in folding. Furthermore, unlike intra-chain interactions in folded proteins, bound water molecules often help mediate protein– protein binding (Creighton, 1993; Bhat et al., 1994). Consequently, ordered water molecules bridging the hydrogen bond network between proteins contribute to the stabilization of the complexes (Bhat et al., 1994; Helms and Wade, 1995). Protein–protein association differs from protein folding. In folding the backbone of the folding polypeptide chain is in principle free to explore all configurations, leading to an optimally folded monomer structure. All degrees of freedom are available to the backbone in its conformational search. That, however, is not the case in the binding of two protein molecules. For most reversible associations, the structures of the proteins do not change significantly upon complex formation and follow a three-state model in their binding process. Initially, the unfolded chains fold to their native configurations. Subsequently, these already folded proteins associate to form their bound state. In this case, although some conformational rearrangements are observed, only rather minor adjustments are generally made to optimize the binding, and most of these are in side chain movements (Norel et al., 1997). This basic difference between folding and binding implies that although the types of interactions involved in both processes are similar, their relative contributions necessarily differ. We have already shown that the hydrophobic effect, although critically important in binding, is not as dominant as in folding (Tsai et al., 1997). Furthermore, we have shown that a large variability is observed among the interfaces. This can be straightforwardly rationalized, as each of the chains has first folded to attain its most stable configuration, with its hydrophobic residues largely buried in the core. The binding monomers seek to optimize their interacting interfaces, however, they have only six rotational and translational degrees of freedom to explore. In other words, the complex only reaches the optimal state in a subset of the conformation space. In addition, the packing of the hydrophobic residues in the interior has already resulted in pushing the hydrophilic residues to the 999 D.Xu, C.-J.Tsai and R.Nussinov surface, and hence in a priori limiting the extent of the potential hydrophobic effect at the interface. In approaching the role played by hydrogen bonds in folding as compared with binding, one needs to bear in mind another fundamental difference. In folding, the buried polar groups in the backbone of protein chains are generally satisfied by forming hydrogen bonds. In doing so, depending on the sequence, a particular secondary structure is chosen. In binding, the secondary structures are already formed. While a β-sheet is sometimes extended across the binding interface, an α-helix can only be formed by a single, continuous chain. These considerations have several implications. First, unlike in the case of folding, in rigid binding a complex may exist in a configuration far from its global minimum (Lin et al., 1995; Xu et al., 1997). Second, on the practical side, this immediately suggests a reason for the difficulty in devising approaches which would energetically distinguish between ‘correct’ and ‘incorrect’ bound configurations. The energy gap may be expected to be significantly smaller, making it very difficult to score predicted complex structures by using energetic parameters (Finkelstein et al., 1995). However, such predictions are extremely important in molecular and drug design. The question of the hydrogen bonds is particularly relevant in this regard. Certainly, owing to the larger fraction of polar, and charged, surface residues, their contribution to stabilizing the interacting molecules is important. However, here geometry plays a key role in determining their quality. Owing to the constraints imposed by bond lengths and bond angles, the geometries of the hydrogen bonds are unlikely to be optimized. The extent and ranges of these, and the role played by the water molecules in this regard, have implications in ligand design. Hydrogen bonds and salt bridges are particularly essential in determining binding specificity (Fersht, 1984; Honig and Yang, 1995). Specific protein–protein association is a pattern recognition process. There are two conditions for binding to occur, namely geometrical complementarity and stability in energetics. The hydrophobic effect, hydrogen bonds and salt bridges all play very important roles in energetics. A hydrogen bond or a salt bridge can provide favorable free energy to the binding (Bartlett and Marlowe, 1984; Gao et al., 1996; Xu et al., 1997). On the other hand, an unfulfilled hydrogen bond donor/acceptor, or an isolated charge without forming a salt bridge, when buried in the protein interface, could substantially destabilize binding, owing to the desolvation effect. Such a contrast in energetics contributes to a high selectivity in matching the hydrogen bonds and salt bridges between proteins, and confers binding specificity. A statistical analysis of hydrogen bonds and of salt bridges across protein–protein interfaces bears on the issue of binding specificity, and hence has direct implications to rational inhibitor design. Studies of inter-chain hydrogen bonds and salt bridges can provide an insight into the role of the hydrogen bonds and salt bridges in domain (folding unit) packing within a monomer as well. The ‘docking’ of folding units, or of compact, hydrophobic, independently folding nuclei (Tsai and Nussinov, 1997), during the protein folding process is similar to that taking place in binding. Both involve recognition, whether inter- or intramolecular. Such a similarity has been elegantly revealed in the three-dimensional domain swapping, which sometimes takes place during oligomer assembly. There, one domain of a monomeric protein is replaced by the same domain from an identical chain of its twin protein (Bennett et al., 1000 1994, 1995). The special role of hydrogen bonds and salt bridges between protein domains has been noticed. For example, some subunits in allosteric proteins bind predominantly through ion pairs and hydrogen bonds (Korn and Burneet, 1991), and a salt bridge network can connect protein subunits or join two secondary structures to form quaternary structures (Musafia et al., 1995). Although we did not carry out a study for the domain interfaces as we have done here for the protein interfaces, they may be expected to behave similarly. In terms of their inter-domain interface composition, we expect them to be composed of less polar residues as compared with inter-chain interfaces, although appreciably more polar than is observed in protein cores. The prevailing view holds that the hydrophobic effect has a dominant role in stabilizing protein structures. However, the contribution of hydrogen bonds and of salt bridges has recently drawn increasing attention (Shirley et al., 1992; Marqusee and Sauer, 1994; Wilson et al., 1994; Hendsch et al., 1996; Myers and Pace, 1996; Tissot et al., 1996; Xu et al., 1997). Statistical analyses have been carried out for hydrogen bonds (Baker and Hubbard, 1984; Jeffrey and Saenger, 1991; McDonald and Thornton, 1994) and salt bridges (Barlow and Thornton, 1983; Rashin and Honig, 1984; Hendsch and Tidor, 1994; Musafia et al., 1995; Gandini et al., 1996) in globular proteins. Few studies have been carried out for those across protein binding interfaces. In particular among these, Janin and Chothia (1990) analyzed hydrogen bonds for 15 protein–protein interfaces. We have compiled a non-redundant dataset of protein– protein interfaces (Tsai, 1996; Tsai et al., 1996), enabling us to address the questions posed above. By excluding intra-chain hydrogen bonds and salt bridges, many of which are enforced by chemical bonds or by secondary structures, we can obtain exclusive information on inter-chain associations. Here we provide a comprehensive statistical investigation of hydrogen bonds, salt bridges and water molecules across protein interfaces, utilizing a substantially larger dataset. We describe our results, their implications and utility. In the next section, we describe the statistical methods which we employed. In the subsequent section, we present results of the statistical analysis of hydrogen bonds and salt bridges across the 319 protein interfaces, illustrating some structural details. We then discuss our observations, highlighting the similarities and the differences with the chains, the binding specificity, and the implications to the energetics of protein–protein association. Finally, we summarize our conclusions. Methods In this section, we introduce the selected dataset and the methods we employed in its statistical analysis. Dataset selection To obtain meaningful statistical properties of hydrogen bonds and salt bridges across protein interfaces, one needs a high quality, non-redundant experimental dataset. For this purpose, starting with 376 non-homologous interfaces (Tsai, 1996), generated from 1629 two-chain interfaces (Tsai et al., 1996) present in the Protein Data Bank (PDB) (Bernstein et al., 1977), we kept only the X-ray structures with a resolution under 3.0 Å. We removed all others, including theoretical models and NMR structures, whose quality is difficult to assess. There are 319 interfaces left in our dataset, as shown in Table I. Among them, 54 interfaces have a resolution of 1.8 H bonds and salt bridges across protein–protein interfaces Table I. PDB codes and chains of the protein interfaces employed 104l:AB 1aal:AB 1aap:AB 1aar:AB 1aaz:AB 1aab:CD 1afn:BC 1ake:AB 1alk:AB 1ank:AB 1aoz:AB 1atn:AD 1atp:EI 1bab:AC 1bab:AD 1bao:AB 1bao:BC 1bar:AB 1bbb:AB 1bbb:AC 1bbh:AB 1bbp:BC 1bbp:BD 1bbp:CD 1bbr:EK 1bbr:HE 1bbr:KG 1bbr:KN 1bbr:LE 1bbt:12 1bbt:14 1bgs:BF 1bgs:FG 1bov:AE 1bsc:BC 1bsr:AB 1c2r:AB 1cau:AB 1cax:CF 1cdd:AB 1cdt:AB 1chm:AB 1cho:EI 1cmb:AB 1col:AB 1cos:AB 1cpc:AB 1cpc:AK 1cse:EI 1csg:AB 1cth:AB 1d66:AB 1dfn:AB 1dhf:AB 1dhj:AB 1dsb:AB 1fc1:AB 1fc2:CD 1fcb:AB 1fia:AB 1fki:AB 1fvc:BD 1fvd:AC 1fxi:AD 1gd1:PQ 1gd1:PR 1gdh:AB 1ggi:LJ 1gla:FG 1glu:AB 1gma:AB 1gmf:AB 1gmq:AB 1gp1:AB 1gpa:AD 1hds:AC 1hge:AC 1hge:AD 1hge:BF 1hge:DE 1hge:EF 1hgt:HI 1hgt:LH 1hhi:BD 1hjj:AB 1hil:CD 1hle:AB 1hpl:AB 1hrh:AB 1hsa:AD 1hsl:AB 1hst:AB 1hvi:AB 1igf:LJ 1igf:LM 1isu:AB 1ith:AB 1jhl:HA 1jhl:LA 1l97:AB 1ldn:AB 1ldn:FG 1lga:AB 1lld:AB 1lmb:34 1log:AD 1lta:DC 1lta:EC 1lts:AC 1lts:DE 1lts:FC 1lys:AB 1mbl:AB 1mch:AB 1min:AD 1min:BD 1min:CD 1mol:AB 1ncb:NH 1ncc:NL 1nco:AB 1ndl:AC 1nip:AB 1nsc:AB 1opa:AB 1opb:AD 1ova:AB 1ova:BD 1ovo:AB 1ovo:BC 1ovo:CD 1paf:AB 1per:LR 1pfk:AB 1plf:AB 1plf:AC 1plf:BD 1pob:AB 1poe:AB 1pox:AB 1pp2:RL 1prc:CH 1prc:CL 1prc:CM 1prc:LH 1prc:LM 1prc:MH 1psa:AB 1psh:AB 1psp:AB 1pts:AB 1pya:AC 1pya:DE 1pya:DF 1pya:EF 1pyd:AB 1pyg:AB 1r09:13 1r09:24 1rag:BD Å or less, as listed in Table II. In parallel, we compared the interfaces with the protein interiors. The protein interiors were analyzed by using the 550 chains involved in the interface dataset of Table I. Hydrogen bond analysis Each hydrogen bond can be characterized by the variables defined in Figure 1. The software used to analyze hydrogen bonds is HBPLUS (McDonald et al., 1993; McDonald and Thornton, 1994). The program determines the positions of missing hydrogens in the PDB and checks each donor–acceptor pair to ascertain its fitness to the geometric criteria as follows: the maximum distances are 3.9 Å between donor and acceptor (d , 3.9 Å) and 2.5 Å between acceptor and hydrogen (r , 2.5 Å); the minimum angles are 90.0° for the angle of donor– hydrogen–acceptor (θ . 90.0°), for the angle of donor– acceptor–acceptor antecedent (φ . 90.0°) and for the angle of hydrogen–acceptor–acceptor antecedent (γ . 90.0°) (Baker and Hubbard, 1984). Amino-aromatic hydrogen bonds are not taken into account in our analysis. Definition of salt bridges The salt bridge is evaluated according to the distance between the donor atoms (Nζ of Lys, Nζ, Nη1 and Nη2 of Arg,Nδ1 and Nε2 of His and the amide N of the N-terminus) and the acceptor atoms (Oε1 and Oε2 of Glu, Oδ1 and Oδ2 of Asp and the two carboxyl oxygen atoms of the Cterminus). If the distance is ø4.0 Å, the pair is counted as a salt bridge (Barlow and Thornton, 1983). When the geometry is acceptable, a salt bridge is also counted as a hydrogen bonding pair. 1rbb:AB 1rcm:AB 1rhg:AC 1rhg:BC 1rib:AB 1rn1:AC 1rtp:12 1rtp:23 1sac:AB 1scm:AB 1scm:AC 1scm:BC 1sdy:AD 1sdy:BD 1shf:AB 1slt:AB 1sos:AE 1sos:FE 1sos:FG 1sps:CF 1srd:AD 1srn:AB 1sry:AB 1stf:EI 1tbe:AB 1tbp:AB 1tcb:AB 1tet:HP 1tet:LP 1tgx:AB 1tme:12 1tme:13 1tme:34 1tnf:AB 1tpk:BC 1tpl:AB 1trk:AB 1trm:AB 1tro:AC 1trz:BD 1tta:AB 1vfa:AB 1vfb:AC 1vfb:BC 1vmo:AB 1vsg:AB 1wsy:AB 1xim:AB 1xim:AC 1xim:AD 1yca:AB 201l:AB 256b:AB 2aai:AB 2abx:AB 2aza:AB 2azu:AD 2azu:BD 2bbk:HJ 2bbk:HL 2bbk:LJ 2ccy:AB 2cga:AB 2cgr:LH 2cwg:AB 2fb4:LH 2gst:AB 2hhm:AB 2hmz:CD 2hpd:AB 2kai:AI 2ltn:AC 2ltn:CD 2mlt:AB 2msb:AB 2mta:AC 2mta:HA 2mta:LA 2nck:RL 2ohx:AB 2pcb:AB 2pcb:AC 2pcc:AC 2pcc:CD 2pfk:AB 2phi:AB 2pka:AY 2pka:BY 2plv:12 2plv:14 2plv:23 2plv:34 2pol:AB 2rsl:AB 2rsl:BC 2scp:AB 2sod:BG 2spc:AB 2tbv:AB 2tmd:AB 2tpr:AB 2trx:AB 2tsc:AB 2utg:AB 3aah:AC 3aah:CD 3eca:AB 3eca:BC 3eca:BD 3gap:AB 3hhr:AB 3hhr:AC 3hhr:BC 3ink:CD 3ins:AB 3lad:AB 3mcg:12 3mds:AB 3mon:CD 3p2p:AB 3rp2:AB 3rub:LS 3sc2:AB 3sdh:AB 3sgb:EI 4azu:AB 4azu:BC 4cha:AB 4cts:AB 4fbp:AC 4fbp:AD 4fbp:CD 4htc:HI 4rub:AB 4rub:BC 4rub:BD 4rub:BV 4rub:CV 4rub:ST 4sbv:AB 4ts1:AB 5cna:BD 5cna:CD 5csc:AB 5rub:AB 6q21:AB 7aat:AB 7ins:AG 7ins:BG 7ins:DG 7tim:AB 8atc:AB 8cat:AB 8fab:CD 8rsa:AB 9gpb:BD 9ldt:AB 9rub:AB 9wga:AB Burial of atoms All the surface areas introduced in this paper are the solvent-accessible surface area (ASA). They were calculated using the program ACCESS (Hubbard, 1992), which is an implementation of the Lee and Richards algorithm (Lee and Richards, 1971). The solvent probe size is 1.4 Å. We used the default van der Waals radii in the ACCESS program. In the surface area calculations, all the ordered water molecules in the PDB structures were ignored. The solvent accessibility of a residue is evaluated by the ratio between the summed atomic accessible surface areas of that residue in the protein and the same residue (X) type in an extended Ala–X–Ala tripeptide (Hubbard et al., 1991). Calculation of expected values Given the observed occurrence of pairs of types Ai (i 5 1,2,....p) and Bj (j 5 1,2,...,q), the expected value of the corresponding pairs can be calculated by mixing Ai and Bj randomly. Assume the number of observed pairs of types Ai and Bj is n(Ai, Bj). First, we can evaluate the total number of Ai, i.e. q n(Ai) 5 Σ n(A , B ), i j (1) j51 and the total number of Bj, i.e. p n(Bj) 5 Σ n(A , B ). i i51 j (2) 1001 D.Xu, C.-J.Tsai and R.Nussinov Table II. Interfaces of high-resolution structures Interface Resolution (Å) S (Å2) hb sb nw Interface Resolution (Å) S (Å2) hb sb nw 1aal:AB 1aap:AB 1bab:AC 1bab:AD 1bbb:AB 1bbb:AC 1cho:EI 1cmb:AB 1cpc:AK 1cse:EI 1dhj:AB 1gd1:PQ 1gd1:PR 1gma:AB 1gmq:AB 1hvi:AB 1isu:AB 1lmb:34 1lys:AB 1mol:AB 1nco:AB 1nsc:AB 1srn:AB 1tgx:AB 1trz:BD 1tta:AB 1vfa:AB 1.6 1.5 1.5 1.5 1.7 1.7 1.8 1.8 1.66 1.2 1.8 1.8 1.8 0.86 1.8 1.8 1.5 1.8 1.72 1.7 1.8 1.7 1.8 1.55 1.6 1.7 1.8 613.9 786.5 636.4 1331.9 1649.8 693.5 1466.1 3532.6 2007.3 1303.7 825.7 2602.3 729.2 1691.0 393.6 2859.6 527.9 1394.8 528.4 1010.8 697.9 3652.4 1599.2 986.5 1007.0 1584.5 1582.1 2 8 8 7 9 3 10 14 16 16 4 12 5 30 2 24 0 6 1 3 5 19 13 1 7 16 6 0 0 1 1 1 0 0 1 4 0 0 0 1 2 0 4 0 2 0 2 0 3 2 0 0 0 1 4 7 0 9 5 1 6 4 9 8 5 126 20 0 2 1 2 0 5 3 6 16 5 24 3 10 1 1vfb:AC 1vfb:BC 256b:AB 2aza:AB 2bbk:HJ 2bbk:HL 2bbk:LJ 2ccy:AB 2cga:AB 2gst:AB 2hmz:CD 2ltn:AC 2ltn:CD 2msb:AB 2ohx:AB 2spc:AB 2trx:AB 2utg:AB 3ins:AB 3mds:AB 3sdh:AB 3sgb:EI 4cha:AB 5rub:AB 8fab:CD 8rsa:AB 9wga:AB 1.8 1.8 1.4 1.8 1.75 1.75 1.75 1.67 1.8 1.8 1.66 1.7 1.7 1.7 1.8 1.8 1.68 1.64 1.5 1.8 1.4 1.8 1.68 1.7 1.8 1.8 1.8 627.4 598.1 632.4 938.6 1390.4 3094.1 1752.9 1644.6 1133.6 2806.6 1910.3 1386.8 6110.3 1291.7 3168.5 5209.9 570.7 3037.9 1602.1 1841.4 1960.3 1095.2 2050.0 5205.9 3291.8 1582.6 211.8 3 6 5 0 8 19 12 3 0 11 12 14 74 5 18 16 1 4 9 10 11 8 10 30 11 9 2 1 0 0 0 2 4 1 0 0 4 10 1 1 0 0 12 0 0 1 2 4 0 0 4 2 4 1 3 5 3 71 192 27 27 4 9 17 45 7 17 4 147 5 6 6 0 68 13 2 6 16 15 4 69 Interfaces whose PDB structures have a resolution of ø1.8 Å. Resolution is the resolution of the PDB structure; S is the total buried ASA in the interface; hb is the number of hydrogen bonds between the two chains; sb is the number of salt bridges across the interface and nw is the number of water molecules which form hydrogen bonds with both chains across the interface. Hydrogen bonds There are 3442 hydrogen bonds across the 319 protein interfaces of our dataset. In the following, we analyze their distribution, composition and geometry. In some cases, we compare their properties with those of 98599 inter-chain hydrogen bonds found in the 550 chains which compose the interfaces in Table I. Fig. 1. Geometrical variables of a hydrogen bond: d 5 distance between donor and acceptor; r 5 distance between acceptor and hydrogen; θ 5 angle of donor–hydrogen–acceptor; φ 5 angle of donor–acceptor–acceptor antecedent; γ 5 angle of hydrogen–acceptor–acceptor antecedent. The total number of all pairs is p ntotal 5 q Σ Σ n(A , B ). i j (3) i51 j51 Hence, the expected number of pairs of types Ai and Bj is n(Ai) n(Bj) ntotal . ne(Ai, Bj) 5 p Σ i 51 Σqj51 n(Ai) n(Bj) (4) Results In this section we present the results of a statistical analysis of hydrogen bonds and salt bridges. Unless stated otherwise, the statistics were carried out on the 319 protein interfaces as listed in Table I described in the Methods section. In some cases where the results are sensitive to the resolution of the structures, we employed the 54 interfaces with high-resolution structures in Table II. 1002 Number of hydrogen bonds across interface. The average number of hydrogen bonds per interface is 10.69 for the interfaces listed in Table I. Similarly, there are 10.26 hydrogen bonds per interface for the high-resolution structures in Table II. This is in agreement with an earlier statistical analysis based on a much smaller dataset (15 interfaces), where 8–13 (an average of 10) hydrogen bonds were found per interface (Janin and Chothia, 1990). The standard deviation for the number of hydrogen bonds per interface in our study is 12.35, which is very large. This is also reflected in the distribution as shown in Figure 2a. The number of hydrogen bonds is strongly correlated with the total buried accessible surface area (ASA) of the interface, which a correlation coefficient of 0.89. The number of hydrogen bonds, n, and the buried ASA of an interface, s, can be matched by a linear relationship, i.e. n 5 5.34s 3 10–3 Å–2. (5) Figure 3 shows the n–s relationship and the fitting line of Equation 5. The strong correlation between n and s also illustrates a relatively narrow distribution of hydrogen bond density across the protein interfaces, as shown in Figure 2b. The average hydrogen bond density is 4.74 3 10–3/Å2 (close to the fitting coefficient in Equation 5) with a standard deviation H bonds and salt bridges across protein–protein interfaces Fig. 2. Distribution of hydrogen bonds. (a) Number of protein interfaces versus number of hydrogen bonds in each protein interface; (b) number of protein interfaces versus hydrogen bond density, i.e. number of hydrogen bonds per 100 Å2 of buried ASA on the interface. Table III. Composition of hydrogen bonds per interface Variable O–O N–N O–N Mean (Å) Standard deviation (Å) Percentage Expected percentage 1.63 2.58 15.2 32.6 0.08 0.40 0.7 18.3 8.93 10.43 83.5 48.6 The rest of the hydrogen bonds are associated with sulfur atoms. Table IV. Interfacial hydrogen bonds associated with main chains and side chains Fig. 3. Relationship between the number of hydrogen bonds, n, and the buried ASA of an interface, s. The scattered dots represent the data of the interface; the line is the linear fit of n–s, as shown in Equation 5. of 2.88 3 10–3/Å2. This means that on average one hydrogen bond is expected if 100 Å of ASA is buried on each side of the protein interface. There are 21 interfaces without any hydrogen bond. They are 1ake:AB, 1bbr:KN, 1c2r:AB, 1fxi:AD, 1hpl:AB, 1isu:AB, 1ovo:CD, 1psp:AB, 1srd:AD, 1tcb:AB, 2aza:AB, 2cga:AB, 2mlt:AB, 2mta:HA, 2mta:LA, 2pcc:AC, 2rsl:AB, 4fbp:AD, 4rub:BV, 4rub:ST and 9gpb:BD. Most of them are interfaces between two identical monomers in an asymmetric unit of crystal packing which are not likely to occur in the solvent (Janin and Rodier, 1995). The other interfaces are those between subunits of quaternary structures. It is likely that all rigid binding complexes involve hydrogen bonds across their interfaces. Composition of hydrogen bonds. The composition of the hydrogen bonds for the types oxygen–oxygen (O–O), nitrogen– nitrogen (N–N) and oxygen–nitrogen (O–N) is shown in Table III. If oxygen and nitrogen atoms contact in a random manner to form hydrogen bonds, the expected percentages for the types O–O, N–N and O–N are 32.6, 18.3 and 48.6%, respectively. The statistics we have obtained reflect a strongly biased percentage. The hydrogen bonds across the interfaces are predominantly the oxygen–nitrogen type. There are very few hydrogen bonds between nitrogen atoms, because few Variable Main chain– Main chain– Side chain– main chain side chain side chain Mean (Å) Standard deviation (Å) Percentage Expected percentage 2.42 4.94 22.6 16.7 3.78 4.97 35.4 47.2 4.20 5.35 39.3 33.4 The rest of the hydrogen bonds are associated with groups other than amino acids types of nitrogens in amino acids (only Nδ1 and Nε2 of histidine) can serve as hydrogen bond acceptors. The composition of hydrogen bonds associated with main chains and side chains is shown in Table IV. The percentage occurrences of main chain–main chain, main chain–side chain and side chain–side chain hydrogen bonds are 64.8, 22.8 and 12.4%, respectively, within chains, but 22.6, 35.4 and 39.3%, respectively, across interfaces. The significantly more intrachain hydrogen bonds of the main chain–main chain type is due to the fact that protein interiors mostly consist of hydrophobic residues which form well defined α-helices and β-sheets. Although appreciably fewer hydrogen bonds are formed by main chain atoms across the protein–protein interfaces, there are substantially more main chain–main chain hydrogen bonds (22.6%) than expected (16.7%). It is interesting to note that the packing of some inter-protein backbones can mimic the packing of intra-protein secondary structures. Figure 4 illustrates that the packing between subtilisin and the chymotrypsin inhibitor forms β-sheets. In this case, the main chain–main chain hydrogen bonds form a close compact structure similar 1003 D.Xu, C.-J.Tsai and R.Nussinov Fig. 4. Stereoview of the backbone conformation across the interface of the complex 2sni (with a resolution of 2.1 Å). The receptor subtilisin (chain E in 2sni) is in pink; the chymotrypsin inhibitor (chain I in 2sni) is shown in a combination of green (for carbon atoms), red (for oxygen atoms) and blue (for nitrogen atoms). The dashed lines represent the hydrogen bonds. This picture and Figures 12, 14 and 16 were generated by the program QUANTA (Molecular Simulations, 1994). Fig. 5. Distribution [P(d)] of the distance between a donor and an acceptor, d, for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen bonds. The interval of sample points is 0.05 Å in this figure and Figure 6. Fig. 6. Distribution [P(r)] of the distance between hydrogen and acceptor, r, for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen bonds. to the case in monomers. Charged groups are heavily involved in the side chain–side chain packing across the interfaces. About half (47.2%) of side chain–side chain hydrogen bonds are salt bridges. Geometry of hydrogen bonds. Figure 5 shows the distribution of the distance between a donor and an acceptor for the interchain and intra-chain hydrogen bonds. The average distance between a donor and an acceptor across the protein interface is 2.92 Å, with a standard deviation of 0.24 Å. The maximum distance is 3.76 Å and the minimum distance is 1.83 Å. The average distance for the inter-chain oxygen–nitrogen type is 2.93 Å, with a standard deviation of 0.21 Å. The average value is very close to the N–O distance observed by neutron diffraction in the crystal structure of amino acids, i.e. from 2.872 to 2.895 Å (Jeffrey and Saenger, 1991). However, the distribution in our statistics is much wider than this range. The distribution of intra-chain hydrogen bonds is similar to that of the inter-chain hydrogen bonds, with an average distance of 2.95 Å and a standard deviation of 0.21 Å. The distribution of the hydrogen–acceptor distances, calculated by HBPLUS, is shown in Figure 6 for both the interchain and intra-chain hydrogen bonds. The distribution of the inter-chain hydrogen bonds is wide, with an average of 2.03 Å and a standard deviation of 0.24 Å. The hydrogen–acceptor distance is one of the measurements of hydrogen bond strength: the shorter the distance, the stronger is the hydrogen bond. Only 1.14% of all the inter-chain hydrogen bonds have a hydrogen–acceptor distance in the range 1.2–1.5 Å, which can be considered as strong hydrogen bonds (Jeffrey and Saenger, 1991). Since normal and weak hydrogen bonding interactions 1004 H bonds and salt bridges across protein–protein interfaces Fig. 7. Distribution [P(θ)] of the hydrogen bond angle (θ), i.e. the angle between donor, hydrogen and acceptor, for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen bonds. The interval of the calculated [θ, P(θ)] points is 2.0° in this figure and in Figures 9 and 10. can be treated by electrostatics (Jeffrey and Saenger, 1991), it is a good approximation that all the inter-chain hydrogen bonds are modeled by the electrostatic interactions between their partial charges. Again, the intra-chain hydrogen bonds have a similar distribution, with an average distance of 2.06 Å and a standard deviation of 0.22 Å. The wide diversity of hydrogen bonds in terms of geometry is also reflected in the wide distribution of the bond angle between donor, hydrogen and acceptor, as demonstrated in Figure 7. The average bond angle of inter-chain hydrogen bonds is 150.7°, with a standard deviation of 17.1°. The intrachain hydrogen bonds have an average bond angle of 151.5°, with a standard deviation of 16.3°. Hence inter-chain hydrogen bonds are slightly more off-linear and have a slightly wider distribution than the intra-chain ones. The bond angle is another assessment of the hydrogen bond strength: the closer the bond angle to 180°, the more significant is the electrostatic contribution and the stronger is the hydrogen bond. Very strong hydrogen bonds have bond angles around 180°, while normal or weak bonds have angles in the range 160 6 20° (Jeffrey and Saenger, 1991). Therefore, the majority of the bonds are normal or weak in terms of energetics, in accordance with the conclusion drawn from the hydrogen–acceptor distance. The distribution of inter-chain hydrogen bond strength can be viewed clearly in the plot of bond angle versus the distance between hydrogen and acceptor, as shown in Figure 8. Most of the interfacial hydrogen bonds have a normal strength, with a distance between hydrogen bond and acceptor of ~2 Å and a bond angle of ~160°. There are not many weak bonds at the lower right part of the figure, and there are few very strong hydrogen bonds in the upper left corner. There are significant differences between the intra-chain and the inter-chain hydrogen bonds in the distribution of the angle between hydrogen, acceptor and acceptor antecedent (γ), as shown in Figure 9a. Although the distribution of interfacial hydrogen bonds has a larger fluctuation than that of the intrachain hydrogen bonds owing to the smaller samples, they are clearly much more off-linear and have a significantly wider distribution. Such differences are also revealed in the distribution of the angle between donor, acceptor and acceptor antecedent (φ), as shown in Figure 10a. The less linearity of the γ and φ angles in the interfacial hydrogen bonds compared with the intra-monomer ones results in weaker dipolar interactions Fig. 8. Hydrogen bond angle versus the distance between hydrogen and acceptor for the 3442 inter-chain hydrogen bonds across protein interfaces. between the hydrogen–donor and acceptor–acceptor antecedent. Hence the inter-chain hydrogen bonds are generally weaker and have a larger diversity than the intra-chain ones. To understand the origin of the above differences, we calculated the distribution of γ and φ for the main chain–main chain hydrogen bonds and for the main chain–side chain and side chain–side chain hydrogen bonds separately, as shown in Figures 9b and 10b and in Table V. Compared with the overall distribution, the main chain–main chain hydrogen bonds have a similar distribution across the protein interface and within the same chain. The similarity between the inter-chain and the intra-chain hydrogen bonds is also observed in the distribution of main chain–side chain and side chain–side chain types. However, the main chain–main chain hydrogen bonds are more linear and have a narrower distribution, i.e. have stronger interactions than the main chain–side chain and side chain– side chain types. Hence the difference between inter-chain and intra-chain hydrogen bonds in the distribution of γ and φ mainly arises from their different percentage occurrences in the main chain–main chain type (22.6% in the protein interface vs 64.8% in the same chain). Since the main chain–main chain hydrogen bonds originate largely from secondary structure elements, that is entirely understandable. In particular, a large proportion of these in the monomers are α-helices. We also compared the distribution of γ and φ between interfaces with large buried ASA and those with small ones. No significant difference was found between these two groups. The interfacial hydrogen bonds with a total buried ASA of ù2500 Å have an average γ of 131.9°, with a standard deviation of 19.0°, while others have an average γ of 132.9°, with a standard deviation of 19.4°. Correspondingly, φ is 133.3 6 19.3° for interfaces with large buried ASA and 134.5 6 19.2° for others. Interfaces with small buried surface areas are likely to represent rigid binding, whereas those with large ones tend to reflect a change in the conformation of the associating proteins upon binding. Nevertheless, the buried ASA is not a clear-cut criterion for distinguishing between rigid and flexible associations. For example, the flexible protein–short peptide binding also has small buried ASA. Hence our data are insufficient to differentiate between the hydrogen bonds in rigid versus flexible binding. Burial of hydrogen bonds. Figure 11a indicates that most interfacial donors/acceptors which form hydrogen bonds either with the same or with a different chain are highly buried. (An 1005 D.Xu, C.-J.Tsai and R.Nussinov Fig. 9. Distribution [P(γ)] of the angle of hydrogen–acceptor–acceptor antecedent (γ) for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen bonds. (a) All the hydrogen bonds; (b) the main chain–main chain hydrogen bonds (thick lines) vs the main chain–side chain and side chain–side chain ones (thin lines). Fig. 10. Distribution [P(φ)] of the angle of donor–acceptor–acceptor antecedent (φ) for the inter-chain (solid lines) and intra-chain (broken lines) hydrogen bonds. (a) All the hydrogen bonds; (b) the main chain–main chain–main chain (thick lines) vs the main chain–side chain and side chain–side chain (thin lines) hydrogen bonds. Table V. Average standard deviations of γ and φ Angle Type All Main chain– main chain Main chain– side chain and side chain–side chain γ (°) Inter-chain Intra-chain Inter-chain Intra-chain 129.3 6 27.4 136.4 6 19.9 129.9 6 29.3 140.1 6 20.1 145.4 6 17.0 140.9 6 19.3 148.9 6 14.9 146.2 6 18.0 127.3 6 21.2 128.3 6 18.5 129.1 6 18.1 128.9 6 18.8 φ (°) interfacial atom is defined as one whose ASA changes by .0.1 Å2 between the unbound chain and the complex.) The more an interfacial donor or acceptor atom is buried, the more likely it is to form a hydrogen bond with a protein atom (see Figure 11b); 5153 out of 6727 (76.6%) fully buried oxygen/ nitrogen atoms form hydrogen bonds in comparison with 797 out of 965 (82.6%) for the high-resolution structures in Table II. The discrepancy in the percentages between the dataset of the 319 protein interfaces and that with higher resolution structures indicates that some actual hydrogen bonds across protein interfaces are excluded owing to errors in atomic coordinates in low-resolution structures. The hydrogen donors or acceptors with non-zero accessibility to solvent and no hydrogen bonding to protein atoms are most likely to form hydrogen bonds with the solvent. 1006 Specificity of hydrogen bonds. We observed a conservation in the pattern of some interfacial hydrogen bonds. An example is shown in the comparison between 1cho (α-chymotrypsin complexed with the turkey ovomucoid third domain) and 1acb (α-chymotrypsin complexed with eglin C). Figure 12a demonstrates that the backbone conformations in the binding regions of the ovomucoid third domain and eglin C are remarkably similar, although the two inhibitors differ in their global structures. The inter-protein hydrogen bonds are also highly conserved, as shown in Figure 12b. Role of water. Table VI shows the occurrence of water in the hydrogen bonding network across the interfaces of the highresolution structures in Table II; 1070 water molecules are involved in bridging 4061 polar atom pairs across the interface by hydrogen bonds. On average, each water molecule connects 3.8 cross-chain atom pairs. Some water molecules have a potential to form hydrogen bonds with more than four protein donors/acceptors. These water molecules are most likely to allocate their hydrogen bonds to the protein atoms dynamically, in the so-called ‘flip-flop’ mechanism (Betzel et al., 1984; Meyer, 1992). There are 19.8 waters per interface. However, as shown in Table II, most interfaces have ,10 waters. There are some interfaces which have .30 waters, namely 1gd1:PQ, 2aza:AB, 2bbk:HJ, 2hmz:CD, 2ohx:AB, 3mds:AB and 9wga:AB, all of which are interfaces between two monomers of the same protein in an asymmetric unit. The complexes H bonds and salt bridges across protein–protein interfaces Fig. 11. (a) Number of interfacial donors/acceptors which form hydrogen bonds versus solvent accessible surface area; (b) percentage of fulfilled donors/ acceptors hydrogen bonding to protein atoms at certain ASA. Table VI. Inter-chain hydrogen bond bridges mediated by water Type Occurrence Expected Donor–H2O–donor Acceptor–H2O–acceptor Donor–H2O–acceptor 617 1756 1688 525.6 1664.6 1870.8 Table VII. Salt bridge distribution Residue Arg Lys His N-terminal Asp Glu C-terminal 179 (159.4) 148 (162.7) 12 (16.3) 100 (118.0) 138 (120.5) 13 (12.1) 11 (10.8) 11 (11.0) 1 (1.1) 4 (4.7) 2 (4.8) 4 (0.5) There are examples where close pairs of like charges are observed across protein interfaces, as shown in Figure 14. Burial of salt bridges. Figure 15 compares the solvent accessibility of the polar atoms on the side chains of salt bridges across the interfaces with that within the chains. Most interchain salt bridges are highly buried, whereas intra-chain ones are generally much less buried, indicating that the environment of the interfacial charges is different from that of the charges in monomers. In the case of protein complexes formed by rigid binding, if both proteins were allowed to change their conformations freely, the system could form a structure whose environment of the interfacial charges is similar to that of the charges in monomers. This further indicates that, unlike the case of folded monomers in rigid binding, protein complexes may be far away from the global minimum conformation. Numbers in parentheses are the expected values for each type. of these monomers are most likely to be enforced by the crystallization, and may not be stable in the solvent. The observed occurrences of the donor–H2O–donor and acceptor– H2O–acceptor pairs are more frequent than their corresponding expected values, indicating that bound interfacial waters tend to mediate pairs of polar groups which cannot form hydrogen bonds directly with each other. Salt bridges There are 623 salt bridges across the 319 protein interfaces. On average, there are about two salt bridges per interface. The number of salt bridges of each high-resolution structure is listed in Table II. Salt bridge distributions. The distribution of different types of salt bridges is shown in Table VII. The count of each salt bridge type is very close to its expected value, indicating a lack of discrimination between specific salt bridge donors and acceptors when they form salt links. Charge distributions. Figure 13 shows the distribution of opposite charge pairs and like charge pairs across the interfaces. There are 18 058 like charge pairs within a cut-off of 20 Å. There are 19 172 opposite charge pairs within the same cutoff. Comparison of the like charge with the opposite charge pair distribution shows that the opposite charge pairs have a strong peak of P(r)/r2 at 2.75 Å (see Figure 13), suggesting that salt bridges across protein interfaces are highly favorable. However, charge complementarity does not always occur. Discussion Our statistical analysis is based on the high-quality X-ray structures of proteins. It allows us to discuss the role of hydrogen bonds, salt bridges and bound water molecules in protein–protein associations. It further enables us to address the similarities and the differences between protein–protein binding and protein folding. Similarities and differences between interfaces and monomers Our study illustrates both the similarities and the differences between the interfacial and the intra-chain hydrogen bonds. The distribution of the distances between the donor and acceptor and between the hydrogen and the acceptor in the interfaces and in the monomers is similar. This suggests that the overall packing between two chains is similar to the packing observed within monomers. In both cases, the hydrogen bonds are generally not very strong, with a wide distribution in their geometries. However, inspection of the angles between donor, hydrogen and acceptor, and in particular of the angles between hydrogen, acceptor and acceptor antecedent, and of those between donor, acceptor and acceptor antecedent, reveals that the interfacial hydrogen bonds are less optimal than the intra-chain ones, with their geometries demonstrating wider distributions. Although there is no other experimental evidence available, the conclusion is justified since each angle was calculated from the atomic coordinates of an experimental protein structure. 1007 D.Xu, C.-J.Tsai and R.Nussinov Fig. 12. (a) Stereoview of the superimposed structures for 1cho (with a resolution of 1.8 Å) and 1acb (with a resolution of 2.0 Å). The α-chymotrypsin portions of both complexes are matched at the Cα positions. (b) Stereoview of the superimposed residues at the binding sites shown in (a). The continuous peptides, from left to right, are Val343, Thr344, Leu345, Asp346 and Leu347 of 1cho, and Cys316, Thr317, Leu318, Glu319 and Tyr320 of 1acb. The surrounding residues of α-chymotrypsin, from left to right, are Gly216, Ser214, Ser195, Gly193 and Phe41, respectively. The small balls show the bound water molecules. The dashed lines represent the hydrogen bonds. In both (a) and (b), the purple and orange represent α-chymotrypsin for 1cho and 1acb, respectively; the blue shows the turkey ovomucoid third domain in 1cho; the red shows eglin C in 1acb. Fig. 13. (a) Distribution of the distance between opposite charges (solid line) and like charges (dashed lines), P(r); (b) P(r) normalized by r2. The interval for calculating each point is 0.5 Å. Our study suggests that the different quality of hydrogen bonds across protein interfaces compared with those within the same chains arises from the larger number of main chain– side chain and side chain–side chain hydrogen bonds which are involved in binding than in folding. The chemical bonds linking atoms in proteins limit their arrangements in both 1008 folding and binding, and prevent polar groups from forming high-quality hydrogen bonds like those observed in amino acid crystals. This explains why the distribution is much wider in both interfacial and intra-chain hydrogen bonds than in the hydrogen bonds in amino acid crystals. On the other hand, there is a difference in the constraints between main chain– H bonds and salt bridges across protein–protein interfaces Fig. 14. Examples of like charge pairs. (a) 2pcb (with a resolution of 2.8 Å): yeast cytochrome c peroxidase complex with horse heart cytochrome c; (b) 3sc2 (with a resolution of 2.2 Å): the A–B chains of serine carboxypeptidase II; (c) 1hvi (with a resolution of 1.8 Å): A–B chains of HIV-1 protease complexed with the inhibitor A77003; (d) 3rp2 (with a resolution of 1.9 Å): A–B chains of rat mast cell protease. Fig. 15. Distribution of solvent accessibility of the polar atoms on the side chains for salt bridges across the interfaces (a) and within the same chains (b). main chain compared with main chain–side chain and side chain–side chain hydrogen bonds. The main chain–main chain hydrogen bonds can form optimal configurations collectively, such as α-helices and β-sheets. However, a hydrophilic side chain often has several polar atoms and the dipole moments of donor–hydrogen and/or of acceptor–acceptor antecedent cannot be aligned optimally in hydrogen bonds simultaneously, as shown in Figure 16. The side chain movements upon binding accommodate hydrogen bonds between an antibody and a lysozyme, in a similar manner to that occurring in forming a salt bridge (Norel et al., 1997). To form the two hydrogen bonds, Gln121 of lysozyme shifts its position and changes its conformation during the binding. However, the alignment of the direction along acceptor antecedent–acceptor is restricted by the bond lengths and bond angles in the amino acids. The movements of the side chain atoms drive them off-equilibrium from the relaxed state. The more the hydrophilic side chains are buried and packed together, the more frustration they experience in their alignment to reach the free energy global minimum state. Such frustration in the packing of the 1009 D.Xu, C.-J.Tsai and R.Nussinov Fig. 16. Stereoview of the side chain movement during the formation of hydrogen bonds in binding. The bound structure (1vfb with a resolution of 1.8 Å) is the FV fragment of mouse monoclonal antibody D1.3 (chain A) complexed with the hen egg lysozyme (chain C). It is shown in a combination of green (for carbon atoms), red (for oxygen atoms) and blue (for nitrogen atoms). Gln121 in the unbound state (5lym, also with a resolution of 1.8 Å) is in pink. The dashed lines show the hydrogen bonds. The dihedral angle Cα–Cβ–Cγ–Cδ of Gln121 is also changed from 172.5° in the unbound state to –167.9° in the bound state. The unbound lysozime (5lym) is matched with the bound lysozyme (chain C of 1vfb) at the Cα positions. polar/charged side chains is similar to the case of the spinglass state (Wolynes, 1990), where spins are trapped in the metastable glass state, rather than being in the global minimum. In monomeric protein folding, such a problem is more likely to be solved by excluding hydrophilic side chains to the surface, where a polar/charged atom can easily form a highquality hydrogen bond with the solvent owing to the flexibility of water molecules. However, in the case of rigid protein binding, as shown in Figure 15, hydrophilic side chains are more likely to be buried in the interfaces to form sub-optimal main chain–side chain and side chain–side chain hydrogen bonds. Hence bound complexes are generally more off-minima than monomers. The difference between folding and binding is also observed in the participation of water in the hydrogen bonding network. There are significantly more buried water molecules in the protein interface than in the interior of the protein. Water can mediate between two hydrogen bond donors, or acceptors, across the interface. A water molecule between a donor and an acceptor across the interface can usually form good hydrogen bonds with both atoms owing to its small size and flexibility. If the water is removed, the donor and acceptor may not form a hydrogen bond or only form a poor one owing to the constraints imposed by both proteins during their binding. Proteins compete with water molecules in binding (Ringe, 1995). The generally weaker inter-chain hydrogen bonds do not compete with the binding of water as efficiently as the intra-chain ones. On the other hand, in the monomer interior the buried hydrogen bonds are predominantly of the main chain–main chain type, which are strong and hence compete favorably with hydrogen bonding to water. In addition, the monomer interior, which is more hydrophobic, is typically unfavorable for buried waters, while a more ‘friendly’ hydrophilic environment in the protein interface can easily form hydrogen bonds with buried water. These are likely to constitute the main reasons why more water molecules are buried in the protein interface than in the interior of the protein. The differences between folding and binding, reflected in the quality of the hydrogen bonds, further support the notion 1010 that protein complexes often do not reach the global energy minima (Lin et al., 1995; Xu et al., 1997). The large number of hydrophilic side chains buried across the interfaces are footprints of rigid-body binding. These, in turn, serve as a clear mark of a metastable state, manifested both in more hydrogen bonds involved in side chains and in more bound water molecules buried in the interface. If two complexed proteins were allowed to undergo a conformational change, optimizing their bound structure freely, such footprints could disappear with the complex reaching its hypothetical global minimum state. However, the high kinetic barrier prevents such an occurrence from happening (Xu et al., 1997). How close a bound complex is to the presumed global minimum state probably varies from case to case, owing to the diversity of biological systems. Some protein interfaces are dominated by the main chain–main chain hydrogen bonds. For example, in the complex between subtilisin and chymotrypsin inhibitor, as shown in Figure 4, eight out of 10 interfacial hydrogen bonds are of the main chain–main chain type. The complex may be close to its global minimum. On the other hand, many other interfaces have few main chain–main chain hydrogen bonds. In the antibody–antigen complex 3hfl (FAB fragment HyHEL-5 complexed with lysozyme), only one out of 11 interfacial hydrogen bonds is of the main chain–main chain type, with many bound water molecules buried at the interface. In this case, the bound state is expected to be far away from its global minimum state. This may originate from the function of the antibody. The small hypervariable regions need to be able to bind an immense variety of antigens within the same structural framework (Creighton, 1993). More hydrophilic side chains are likely to be involved in the binding interface to be available for mutations that can adapt the hydrogen bonding patterns to different antigens. This restriction in the allowed conformations imposes further constraints, in addition to the rigid body binding. Hence for the complex it is even more difficult to attain its global minimum state. Electrostatic complementarity and binding specificity Electrostatic complementarity across the binding interface is revealed in the formation of hydrogen bonds and salt bridges. H bonds and salt bridges across protein–protein interfaces Most highly buried donors and acceptors form hydrogen bonds. The charge distribution indicates that opposite charge pairs are substantially more favorable than like charges. In some cases a minor perturbation of the hydrogen bond/charge network across the binding interface may substantially destabilize the complex (Chacko et al., 1995). Backbone–backbone hydrogen bonds play an important role in protein–protein interactions. Protein backbones of different chains can associate to form complementary β-sheets across the bound interface. This is in agreement with a recent statistical analysis by Vakser (1996), showing that the complementarity between backbones may facilitate the initial phase of the binding. Binding specificity, defined by electrostatic complementarity, is clearly revealed in the conserved pattern of hydrogen bonding network as illustrated in Figure 12. A requirement for a conserved hydrogen bonding network has also been observed in several enzyme–ligand complexes The selectivity of the binding of the protein kinase family is achieved through preserved hydrogen bonds (Xu et al., 1996). We have recently predicted the binding between the yeast chorismate mutase and a transition state analog, and compared it with the binding between a bacterial chorismate mutase and the same ligand (Lin et al., 1997). Our study shows that the binding function is conserved via a common mechanism with common salt bridges and hydrogen bonds, rather than via a conserved sequence or global structure. Nevertheless, the atom-based complementarity is not an absolute requirement. There are some fully buried hydrogen bond donors/acceptors which do not form any hydrogen bonds. There are also some like charge pairs. In the case of the HIV protease (1hvi in Figure 14c), it is known that only one of the carboxyls of Asp25(A) and Asp25(B) is ionized, although the position of the proton has not been determined experimentally (Creighton, 1993). A highly unfavorable like charge pair can shift the pKa of the ionizable groups so that either one or both ions are neutral. This is likely the case for 3sc2 shown in Figure 14b also. The like charge pairs may also form triads with their opposite charges, to obtain partial electrostatic compensation, as depicted in Figure 14a and d. If a like charge pair is solvent accessible, it may attract counter ions in the solvent to balance its charges. Implications to the docking problem Our results are expected to aid in identifying potential binding sites and in scoring binding modes predicted by geometrically based methods. Geometry-based docking approaches generally yield a large number of potential ligand binding conformations (Norel et al., 1994, 1995; Fischer et al., 1995). Current scoring schemes often fail to predict the correct binding modes (Lybrand, 1995). Hydrogen bonds and salt bridges, as contributors to strong physical interactions, comprise an important component in the assessment of binding (Meyer et al., 1996). By using statistically derived data on hydrogen bonds and salt bridges across protein interfaces, one may develop a scoring system with a strong chemical relevance to identify the binding modes. This approach can be particularly useful in the flexible docking problem, where receptors and/or ligands may change their shape upon binding, making it extremely difficult to predict correct binding conformations utilizing geometrically based methods. Since the surface patterns of receptors and ligands, in terms of atom composition, change very little upon binding, it is possible to use complementarity between hydrogen bonding donors and acceptors, and between opposite charges, and their relationship to the percentage burial in the formed interface, to identify bound from unbound configurations. Conserved patterns of hydrogen bonds may also be utilized. Hence the structure of a complex between a receptor and a ligand may be used as a template to predict the binding mode between another ligand and the same receptor if the two inhibitors are known to bind at the same region of the receptor. The distribution of the distances between the hydrogen bond donors and acceptors also sheds some light on the scoring of the docked predictions. The van der Waals radii for both nitrogen and oxygen atoms in the Charmm parameter set are 1.6 Å. Since the average distance between a donor and an acceptor is 2.92 Å, with a standard deviation of 0.24 Å, most hydrogen bonds have a distance smaller than the sum of the van der Waals radii of the donor and acceptor. This is understandable owing to the attractive interaction between the donor and acceptor in a hydrogen bond. However, such a strong penetration may affect the quality of the docked predictions. Typically, docking programs penalize any van der Waals penetrations (Norel et al., 1994, 1995; Fischer et al., 1995). Our study indicates that the scoring should not penalize a reasonable penetration between hydrogen bond donors and acceptors. This is particularly important when the ligand is small and very sensitive to the surface complementarity. Another practical problem in docking is buried water molecules across the binding interface. Our study shows that there are many bound waters across protein–protein interfaces. Water can also mediate protein–small ligand binding (Fauman et al., 1994). However, current docking methods do not include water. This may decrease the docking performance in some cases, notably in the antibody–antigen binding (Fischer et al., 1995; Meyer et al., 1996), where interfacial water molecules are present more extensively than in most other complexes. Our study on the statistical analysis of the interfacial water, and further work along these lines, may help to locate water molecules in binding and enhance docking predictions. Conclusion Although the types of interactions at protein–protein interfaces resemble those observed in the interior of protein monomers, there are also some inherent differences. In both cases the hydrophobic effect plays an important role. Nevertheless, the extent of the hydrophobic effect in the interior of protein monomers is significantly larger than that observed at protein– protein interfaces (Tsai et al., 1997). On the other hand, while salt bridges destabilize protein cores, they may contribute to stabilize protein associations. Inspection of the types of residues at the interfaces, as compared with the monomers, indicates that polar and charged residues constitute a larger percentage than in monomers. Here, we examined the hydrogen bonds and the salt bridges. To this end, we carried out an extensive analysis of the hydrogen bonds and of the salt bridges in a collection of 319 non-redundant protein–protein interfaces, assembled previously from protein X-ray structures. We found that, on average, there are 10.7 hydrogen bonds and 2.0 salt bridges per interface. Charge complementarity is found for both charges and hydrogen bonding donors/acceptors. However, 17.4% of fully buried donors or acceptors in highresolution structures do not form any hydrogen bonds, and some like charges are at a close distance. Polar atoms on the backbone have a strong tendency to form hydrogen bonds with backbone atoms across the interface, and some main chain–main chain hydrogen bonds can form β-sheets. 1011 D.Xu, C.-J.Tsai and R.Nussinov In particular, our results indicate that the quality of the hydrogen bonds in the interfaces is not as good as that generally observed within the chains. This is reflected both in the angular distributions and in the significantly larger number of water molecules mediating hydrogen bonds at the interfaces. The lower quality of interfacial hydrogen bonds is attributed to the large number or polar/charged side chains buried across protein interfaces, which is enforced by the hydrophilic surface of the monomers and the rigid body binding. Rigid body protein– protein associations can be described by a three-state model, where the unfolded chains first fold to their native, lowest energy configurations. Subsequently, the folded chains associate, to form their bound complexes. Although relatively minor conformational rearrangements occur, basically the already folded chains have only three rotational and three translational degrees of freedom to optimize their binding, leaving many hydrophilic side chains and water molecules buried in the interface. This is unlike the case of protein folding. Hence the bound complex is more likely to exist in a metastable state, rather than at the global minimum. The results obtained in this study further enhance our understanding of the similarities and of the differences between folding and binding. They bear on the specificity of protein– protein associations. As such, they are useful for inhibitor design and for scoring multiple docked conformations predicted by the geometrically based methods. Here we have examined protein–protein binding, rather than protein–small molecule interactions or protein–nucleic acid associations. In the latter case, hydrogen bonds often conceivably play a dominant role, and hence their patterns can be used more directly to aid in distinguishing native from nonnative binding orientations. Their patterns of interactions not only dictate specific recognition, but also provide much of the stability. Acknowledgements We thank Drs Jie Liang, Shuo L.Lin, Aijun Li, David Covell, Anders Wallqvist, Saraswathi Vishveshwara and, in particular, Jacob Maizel, for helpful discussions. We thank the personnel at the Frederick Cancer Research and Development Center for their assistance. All the calculations presented in this paper were carried out on Silicon Graphics workstations operated by the Frederick Biomedical Supercomputing Center, National Cancer Institute. The research of R.Nussinov was sponsored by the National Cancer Institute, DHHS, under Contract No. 1-CO-74102 with SAIC, and in part by grant No. 95-00208 from the BSF, Israel, by a grant from the Israel Science Foundation administered by the Israel Academy of Sciences, and by the Rekanati Fund. The content of this publication does not necessarily reflect the views or policies of the Department of Human Service, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government. By acceptance of this paper, the publisher or recipient acknowledges the right of the US Government to retain a non-exclusive, royalty-free license in and to any copyright covering the paper. References Baker,E.N. and Hubbard,R.E. (1984) Prog. Biophys. Mol. Biol., 44, 97. Barlow,D.J. and Thornton,J.M. (1983) J. Mol. Biol., 168, 867–885. Barlow,D.J. and Thornton,J.M. (1986) Biopolymers, 25, 1717–1733. Bartlett,P.A. and Marlowe,C.K. (1984) Trends Biochem. Sci., 9, 145–147. Bennett,M.J., Choe,S. and Eisenberg,D. (1994) Proc. Natl Acad. Sci. USA, 91, 3127–3131. Bennett,M.J., Schlunegger,M.P. and Eisenberg,D. (1995) Protein Sci., 4, 2455–2468. Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542. Betzel,C., Saenger,W., Hingerty,B.E. and Brown,G.M. (1984) J. Am. Chem. Soc., 106, 7545–7557. 1012 Bhat,T.N. et al. (1994) Proc. Natl Acad. Sci. USA, 91, 1089–1093. Chacko,S., Silverton,E., Kam-Morgan,L., Smith-Gill,S., Cohen,G. and Davies,D. (1995) J. Mol. Biol., 245, 261–274. Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet., 11, 271–280. Creighton,T.E. (1993) Proteins. 2nd edn. Freeman, San Francisco. Fauman,E.B., Rutenber,E.E., Maley,G.F., Maley,F. and Stroud,R.M. (1994) Biochemistry, 33, 1502–1511. Fersht,A.R. (1984) Trends Biochem. Sci., 9, 145–147. Finkelstein,A.V., Gutin,A.M. and Badretdinov,A.Y. (1995) Proteins: Struct. Funct. Genet., 23, 151–162. Fischer,D., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 248, 459–477. Gandini,D., Gogioso,L., Bolognesi,M. and Bordo,D. (1996) Proteins: Struct. Funct. Genet., 24, 439–449. Gao,J., Mammen,M. and Whitesides,G.M. (1996) Science, 272, 535–537. Helms,V. and Wade,R.C. (1995) Biophys. J., 69, 810–824. Hendsch,Z.S., Jonsson,T., Sauer,R.T. and Tidor,B. (1996) Biochemistry, 35, 7621–7625. Hendsch,Z.S. and Tidor,B. (1994) Protein Sci., 3, 211–226. Honig,B. and Yang,A.S. (1995) Adv. Protein Chem., 46, 27–59. Hubbard,S. (1992) ACCESS. EMBL. Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) J. Mol. Biol., 220, 507–530. Janin,J. and Chothia,C. (1990) J. Biol. Chem., 265, 16027–16030. Janin,J. and Rodier,F. (1995) Proteins: Struct. Funct. Genet., 23, 580–587. Jeffrey,G.A. and Saenger,W. (1991) Hydrogen Bonding in Biological Structure. Springer, Berlin. Korn,A.P. and Burneet,R.M. (1991) Proteins: Struct. Funct. Genet., 9, 37–55. Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379–400. Lin,S.L., Tsai,C.J. and Nussinov,R. (1995) J. Mol. Biol., 248, 151–161. Lin,S.L., Xu,D., Li,A., Roiterst,M., Wolfson,H.J. and Nussinov,R. (1997) Lybrand,T.P. (1995) Curr. Opin. Struct. Biol., 5, 224–228. Marqusee,S. and Sauer,R.T. (1994) Protein Sci., 3, 2217–2225. McDonald,I., Naylor,D., Jones,D. and Thornton,J. (1993) HBPLUS: Hydrogen Bond Calculator Version 2.25. University College London, London. McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777–793. Meyer,E. (1992) Protein Sci., 1, 1543–1562. Meyer,M., Wilson,P. and Schomburg,D. (1996) J. Mol. Biol., 264, 199–210. Molecular Simulations (1994) QUANTA 4.0. Molecular Simulations, Burlington, MA. Musafia,B., Buchner,V. and Arad,D. (1995) J. Mol. Biol., 254, 761–770. Myers,J.K. and Pace,C.N. (1996) Biophys. J., 71, 2033–2039. Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1994) Biopolymers, 34, 933–940. Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 252, 263–273. Norel,R., Lin,S.L., Xu,D., Wolfson,H.J. and Nussinov,R. (1997) Submitted. Novotny,J. and Sharp,K. (1992) Prog. Biophys. Mol. Biol., 58, 203–224. Rashin,A. and Honig,B. (1984) J. Mol. Biol., 173, 515–521. Ringe,D. (1995) Curr. Opin. Struct. Biol., 5, 825–829. Shirley,B.A., Stanssens,P., Hahn,U. and Pace,C.N. (1992) Biochemistry, 31, 725–732. Tissot,A.C., Vuilleumier,S. and Fersht,A.R. (1996) Biochemistry, 35, 6786– 6794. Tsai,C.J. (1996) Protein–Protein Interface. Laboratory of Mathematical Biology World Wide Web (WWW) page, hhtp://www-lmmb.ncifcrf.gov/ tsai/. Tsai,C.J., Lin,S.L., Wolfson,H. and Nussinov,R. (1996) J. Mol. Biol., 260, 604–620. Tsai,C.J., Lin,S.L., Wolfson,H. and Nussinov,R. (1997) Protein Sci., 6, 53–64. Tsai,C.J. and Nussinov,R. (1997) Protein Sci., 6, 24–42. Vakser,I.A. (1996) Protein Engng, 9, 741–744. Warshel,A. and Russell,S.T. (1984) Q. Rev. Biophys., 17, 283–422. Wilson,C., Mau,T., Weisgraber,K.H., Wardell,M.R., Mahley,R.W. and Agard,D.A. (1994) Structure, 2, 713–718. Wolynes,P.G. (1990) In Stein,D. (ed.), Spin Glasses and Biology. World Scientific, Singapore. Xu,D., Lin,S.L. and Nussinov,R. (1997) J. Mol. Biol., 265, 68–84. Xu,R.M., Carmel,G., Kuret,J. and Cheng,X. (1996) Proc. Natl Acad. Sci. USA, 93, 6308–6313. Received January 22, 1997; revised March 28, 1997; accepted May 5, 1997
© Copyright 2026 Paperzz