The role of geometric complementarity in secondary structure packing: A systematic docking study SULIN JIANG,1 ANDREI TOVCHIGRECHKO,2 AND ILYA A. VAKSER2 1 Department of Biochemistry, Weill Medical College of Cornell University, New York, New York 10021, USA Bioinformatics Laboratory, Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, New York 11794-3600, USA 2 (RECEIVED January 24, 2003; ACCEPTED April 25, 2003) Abstract A strong similarity between the major aspects of protein folding and protein recognition is one of the emerging fundamental principles in protein science. A crucial importance of steric complementarity in protein recognition is a well-established fact. The goal of this study was to assess the importance of the steric complementarity in protein folding, namely, in the packing of the secondary structure elements. Although the tight packing of protein structures, in general, is a well-known fact, a systematic study of the role of geometric complementarity in the packing of secondary structure elements has been lacking. To assess the role of the steric complementarity, we used a docking procedure to recreate the crystallographically determined packing of secondary structure elements in known protein structures by using the geometric match only. The docking results revealed a significant percentage of correctly predicted packing configurations. Different types of pairs of secondary structure elements showed different degrees of steric complementarity (from high to low: –, loop–loop, ␣–␣, and ␣–). Interestingly, the relative contribution of the steric match in different types of pairs was correlated with the number of such pairs in known protein structures. This effect may indicate an evolutionary pressure to select tightly packed elements of secondary structure to maximize the packing of the entire structure. The overall conclusion is that the steric match plays an essential role in the packing of secondary structure elements. The results are important for better understanding of principles of protein structure and may facilitate development of better methods for protein structure prediction. Keywords: Docking; protein modeling; molecular recognition; protein folding; structural bioinformatics Principles of protein folding and protein recognition are the basis for understanding life processes at the molecular level. They also provide the foundation for the algorithms of predicting protein structure and interactions. The principles that determine the structure of individual proteins and the structure of protein complexes are amazingly similar (Tsai et al. 1996; Vajda et al. 1997; Keskin et al. 1998; Shoemaker et al. 2000; Glaser et al. 2001; Baker and Lim 2002). One of such principles is tight packing of structural eleReprint requests to: Ilya A. Vakser, Bioinformatics Laboratory, Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794-3600, USA; e-mail: vakser@ams. sunysb.edu; fax: (631) 632-8490. Article and publication are at http://www.proteinscience.org/cgi/doi/ 10.1110/ps.0304503. 1646 ments inside proteins and at protein–protein interfaces (see Ponder and Richards 1987; Hubbard and Argos 1994). The interaction of secondary structure elements in protein structures may be formulated in terms of docking. The docking is traditionally considered to be a problem of matching two separate molecules. The main difference in matching secondary structure elements and matching separate molecules is in the constraints imposed by the environment. Earlier studies explored the applicability of docking to secondary structure packing (Ausiello et al. 1997; Yue and Dill 2000; Vakser and Jiang 2002). A multiplicity of physicochemical factors obviously plays a role in the packing of secondary structure elements in proteins and in the formation of protein complexes. However, the well-known tight packing of structural elements indicates the importance Protein Science (2003), 12:1646–1651. Published by Cold Spring Harbor Laboratory Press. Copyright © 2003 The Protein Society Geometric complementarity in structure packing of the geometric fit. The steric complementarity in protein interactions has been studied extensively (for a review, see Halperin et al. 2002). Obviously, the role of the steric complementarity in the interaction of secondary structure elements deserves similar attention. Earlier studies of this subject were primarily focused on helix–helix packing (see Richmond and Richards 1978; Cohen et al. 1979; Chothia et al. 1981; Murzin and Finkelstein 1988; Reddy and Blundell 1993; Walther et al. 1996). One of the reasons was the limited number of high-quality crystal structures. Another reason probably was a traditional biochemical view on interactions of secondary structure elements that largely neglected the geometric complementarity as an important factor (with the exception of helix–helix interactions). The examples may be found in basic biochemistry texts, which describe the packing of -strands as determined solely by hydrogen bonding. Earlier studies addressed the interaxial distances in ␣–␣, –, and ␣– packing with implications to comparative modeling of proteins (Nagarajaram et al. 1999; Reddy et al. 1999). However, a systematic study of the role of geometric complementarity in the packing of secondary structure elements is still lacking. In this article, we present such a study. A docking algorithm based on geometric complementarity is applied to a comprehensive database of secondary structure elements derived from the Protein Data Bank (PDB). The results show that the steric fit plays an important role in the interaction of the secondary structure elements. Docking of secondary structure Database of proteins The study is based on a comprehensive nonredundant set of protein structures from PDB. The Pdb㛭select database of Hobohm and Sander (1992, 1994) was used. It contains comprehensive sets of protein chains with homologous structures excluded at a given homology level. We use the most stringent 25% list, in which no two proteins have >25% sequence identity. We further removed the NMR structures and the low-resolution structures (>2.5 Å resolu- tion) from the list. The remaining 886 protein chains were used at the next step, for the selection of the secondary structure elements. Secondary structure elements The following procedure was used to select the pairs of interacting secondary structure elements. For each protein chain, the Dssp procedure (Kabsch and Sander 1983) was used to identify the secondary structure elements: ␣-helix, -strand, or a loop. The minimal number of residues was required to be four in a -strand and three in a loop. Helices containing less than six residues were excluded to guarantee the presence of at least two turns. A pair of secondary structure elements was considered interacting if three or more residues on each side were in contact. The residues were considered to be in contact if their C − C distance (C␣ for Gly) was ⱕ6 Å. Only pairs with a residue contact ratio (number of residues in contact divided by the total number of residues in the pair) ⱖ30% were selected for this study. Table 1 shows the number of pairs of secondary structure elements in contact. Docking Our docking program GRAMM (http://reco3.ams.sunysb. edu/gramm) was used to match the secondary structure elements in the database. GRAMM performs an exhaustive grid-based search for surface complementarity of molecules or molecular fragments. At a given angular orientation of the molecular structures, all relative translations of the structures (within the accuracy of the grid) are tested by a correlation procedure using fast Fourier transformation. The search is repeated at all angular orientations. The score of a match in GRAMM is numerically equivalent to the intermolecular atom–atom energy based on a step-function approximation of the Lennard-Jones potential. For each pair of docked molecular structures, GRAMM outputs a list of lowest-energy matches sorted according to the energy (from low to high). Lower energy values correspond to a better geometric fit. GRAMM has been described in detail in a number of publications (Katchalski-Katzir et al. 1992; Vak- Table 1. Number of pairs of secondary structure elements Interface/total surface ratio Structures ␣–␣ 30% 40% 443 (182) 161 (92) 50% 60% 70% 80% 90% 100% Total 25 (19) 2 (2) 1 (1) 0 0 0 632 (296) – 41 (13) 137 (66) 328 (213) 594 (421) 787 (478) 1074 (858) 559 (493) 489 (412) 4009 (2954) ␣– 86 (65) 41 (17) 13 (7) 1 (1) 1 (1) 0 0 0 142 (91) loop–loop 430 (204) 305 (174) 219 (144) 117 (82) 58 (48) 27 (25) 3 (3) 35 (30) 1194 (710) Total 1000 644 585 714 847 1101 562 524 5977 Numbers in parentheses are the numbers of pairs with correctly predicted matches in 1000 lowest-energy configurations. www.proteinscience.org 1647 Jiang et al. Figure 1. Examples of pairs of secondary structure elements. The experimentally observed and the predicted configurations of the pairs show a high degree of steric complementarity. tion (RMSD) from the crystal structure of the complex was calculated. The matches with RMSD ⱕ2.5 Å were considered correct. If there were more than one correct match for a pair, the lowest-energy match was picked as the final result for that pair. The numbers in the parentheses in Table 1 are the number of pairs that had correct results within the 1000 lowest-energy matches. The docking results are shown in Figure 2. The percentage of pairs with a correct match at a given energy rank is plotted against the energy rank. The results show that the percentage of the correct matches is strongly correlated with the energy values. The percentage grows dramatically for the lowest-energy matches (to a lesser extent for ␣– pairs). Later, we show that a part of this trend is based on our selection protocol (selection of a single lowest-energy match out several correct matches per pair). However, the observed trend is far stronger that the one introduced by this bias in the selection of matches. Therefore, because our energy is based purely on surface complementarity, this pronounced trend indicates the importance of the surface match in the interaction of secondary structure elements. For a further analysis, we separated the pairs into two subsets, according to the size of the interface. The largeinterface pairs were defined as those with the following interface/total surface ratio (see Table 1): ⱖ40% for ␣–␣, ⱖ80% for –, ⱖ40% for ␣–, and ⱖ50% for loop–loop. ser and Aflalo 1994; Vakser 1995, 1997) and has been extensively tested and verified by experiments. GRAMM allows docking at different resolutions according to the accuracy of the molecular fragments. Because in this study the docking objects were relatively small high-resolution molecular fragments, the high-resolution docking (grid steps in the atom-size range) was applied. The docking was performed with 2 Å grid step and 12° angular interval. Results and Discussion The goal of this study was to determine the role of the steric complementarity in interaction of secondary structure elements. A tight geometric match is obvious from the visual analysis of the secondary structure packing (Fig. 1). However, for an objective evaluation of the role of the steric match one has to show that (1) it is possible to predict the correct packing of the secondary structure elements by a procedure based on the geometric fit alone, and (2) such predictions are nonrandom. Such computations were conducted for four sets of pairs of secondary structure elements: ␣-helix–␣-helix, -strand–-strand, ␣-helix–-strand, and loop–loop (Fig. 1). Predictions based on surface complementarity For each of the 1000 lowest-energy matches of a pair of secondary structure elements, the root mean square devia1648 Protein Science, vol. 12 Figure 2. Percentage of pairs with correctly predicted matches at a given energy rank. The “energy” is based on surface match only (for details, see text). The percentage is an average in groups of 20 ranks. The red line corresponds to all pairs of the specific type (␣-helix–␣-helix, -strand–strand, ␣-helix–, loop–loop, as indicated on each panel). The green and the blue lines indicate pairs with small and large interfaces, respectively (for the definition of small and large interfaces, see text). Geometric complementarity in structure packing The rest of the pairs were assigned to the small-interface subset. The criteria for the large and the small interfaces were chosen to make the sets approximately equally populated. The results in Figure 2 show that the trend of higher percentage of lowest-energy matches is naturally stronger for the larger-surface interfaces. Significance of the predictions: Comparison with random models To prove the significance of the steric complementarity revealed by the results in Figure 2, we compared the actual docking results with would be random ones. A detailed study of random models in molecular recognition was published earlier (Tovchigrechko and Vakser 2001). In the present study, we use two models: (1) random docking and (2) random energy rank. Model 1: Random docking In this model, the secondary structure elements are approximated by cylinders (␣-helices and -strands) and spheres (loops, due to their irregular shapes). The model is illustrated in Figure 3. To eliminate no-contact cases, we set the interaxial distance for docked pairs at 10 Å (Reddy and Figure 3. Model 1: Random docking. (A) ␣–␣, –, and ␣– pairs are represented by cylinders. The length of the cylinders and the distance between the axes is 10 Å. (B) Loop–loop pairs are represented by spheres with radii at 10 Å. The dark areas are the range of the correct match. For better visualization, the coordinates corresponding to only two degrees of freedom are shown (for details, see text). Blundell 1993; Nagarajaram et al. 1999; Reddy et al. 1999). The correct match region was defined as ⱕ2.5 Å from the crystal structure position (approximately in accordance with our definition of the correct match in the actual docking). Such a shift is equivalent to ∼30° change in orientation, out of possible 360° (Fig. 3A). To avoid no-contact cases, an up-and-down translation along the axes of the 10-Å cylinders (the minimum length of the secondary structure elements in this study) is possible to a maximum of 10 Å. If the correct match area is ⱕ2.5 Å in each direction, the chance of hitting the correct area is one out of four. The correct match corresponds to the ∼30° spin around the axis of the second helix and to the ∼30° angle between the axes of the helices. Combining these four coordinates, we obtain 0.02% as the upper limit for the probability of the correct match in this random docking model. This probability of a random success can be used to estimate the statistical significance of the actual docking results. The model of loop–loop docking as a match of two spherical bodies is shown in Figure 3B. The smallest loop fragment with three residues has a diameter of ∼10 Å. Thus, the distance between the centers of the spheres was set as 10 Å. The correct match area for the center of mass of the second loop is defined as a circle with the 2.5 Å radius around the position in the crystal structure. This area is ∼1.6% of the entire area of the sphere. For each position of the center of mass of the second loop, there are different rotational orientations (e.g., defined by three Euler angles). If we conservatively consider the average distance from an atom in this loop to the center of mass of the loop as 2.5 Å, then the maximal angle of rotation corresponding to the 2.5-Å RMSD must be ∼60°. The number of nonredundant combinations of three Euler angles forming a grid of 60° rotations is 79 (the number is taken from GRAMM angular parameter file; I.A. Vakser, unpubl.). Thus, combining the probabilities for translational and rotational degrees of freedom, we get 0.02% as the upper limit on the probability of a random correct prediction for loop–loop docking. Model 2: Random energy rank If a docking run for a pair of secondary structure elements resulted in more than one correct match within 1000 lowestenergy solutions, only one match (the one with the lowest energy) was included in the statistics in Figure 2. This selection criterion should have resulted in somewhat inflated percentages of the lowest-energy matches. To reveal the extent of this artifact, we randomized the energy rank in the lists of 1000 lowest-energy matches. These new distributions of random-rank matches were compared with the actual distributions (Fig. 4). The actual percentages of the lowest-energy matches (ranks 1–20) were significantly larger than the random ones, with the exception of ␣– pairs. A possible reason for the absence of the trend that goes beyond the random level in Model 2, in the case of www.proteinscience.org 1649 Jiang et al. for the top 100 matches, 2%. Interestingly, the probabilities of correct predictions for different types of pairs (–, loop–loop, ␣–␣, ␣–) are strongly correlated with the number of pairs available in each case (Table 1). This effect may represent the evolutionary pressure on protein structures to select tightly packed elements of secondary structure to maximize the packing of the entire protein. Conclusions Figure 4. Model 2: Random energy rank. Comparison of docking results with the actual energy rank (black line) and those with the random rank (gray bars). The percentage is an average in groups of 20 ranks. The percentage of the matches with the actual energy rank is higher than in Figure 2 because, due to the nature of this model, only the pairs with correct predictions were taken into account. ␣– packing, may be the small number of pairs (see Table 1). However, one also has to consider a possibility that the small number of ␣– pairs in known protein structures may be the result of their loose packing, which would contribute unfavorably to a tight packing of protein structures. It is important to point out that these considerations do not affect the ability of GRAMM to correctly dock and identify the low-energy matches based on geometric complementarity alone, including the ␣– pairs. A strong similarity between the major aspects of protein folding and protein recognition is one of the emerging fundamental principles in protein science. A crucial importance of steric complementarity in protein recognition is a wellestablished fact. The goal of this study was to assess the importance of the steric complementarity in protein folding, namely, in the packing of the secondary structure elements. Although the tight packing of protein structures, in general, is a well-known fact, a systematic study of the role of geometric complementarity in the packing of secondary structure elements has been lacking. To assess the role of the steric complementarity, we used a docking procedure to recreate the crystallographically determined packing of secondary structure elements by using the geometric match only. The pairs of secondary structure elements (␣-helix– ␣-helix, -strand–-strand, ␣-helix–-strand, and loop– loop) were taken from a nonredundant database of known protein structures. The docking results revealed a significant percentage of correctly predicted packing configurations. The significance of the results was verified by comparison with random docking models. Different types of pairs of secondary structure elements showed different degrees of steric complementarity (from high to low: –, loop–loop, Analysis of the correct match probability The chances of finding the correct match by the geometric match procedure are shown in Figure 5. The probabilities are shown for the first 10, 20, 50, and 100 predicted matches. The highest probabilities are for the – packing, followed by the loop–loop packing and then by ␣–␣ packing. The lowest probabilities are for the ␣– packing. As shown above, the results for the ␣– pairs are within the random range in Model 2. However, they are still significantly higher than the random range in Model 1. If the probability of a single correct match is 0.02% as in Model 1, then the probability of having at least one correct match among the top N matches randomly is 1 − [1 − 0.0002]N ≈ 0.02% × N. Thus, for the top 10 matches, the probability is 0.2%; for the top 20 matches, 0.4%; for the top 50, 1%, and 1650 Protein Science, vol. 12 Figure 5. Probabilities of the correct match in low-energy predictions. For comparison, probabilities of the correct match obtained randomly are as follows: For the top 10 matches, 0.2%; for the top 20 matches, 0.4%; for the top 50, 1%; and for the top 100 matches, 2% (see text). Geometric complementarity in structure packing ␣–␣, ␣–). Interestingly, the relative contribution of the steric match in different types of pairs was correlated with the number of such pairs in known protein structures. This effect may indicate an evolutionary pressure to select tightly packed elements of secondary structure to maximize the packing of the entire structure. The overall conclusion is that the steric match plays an essential role in the packing of secondary structure elements. The results are important for better understanding of principles of protein structure and may facilitate development of better methods for protein structure prediction. Acknowledgments The study was supported by NIH R01 GM61889 and NSF DBI9808093 grants. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact. References Ausiello, G., Cesareni, G., and Helmer-Citterich, M. 1997. ESCHER: A new docking procedure applied to the reconstruction of protein tertiary structure. Proteins 28: 556–567. Baker, D. and Lim, W.A. 2002. Folding and binding: From folding towards function. Curr. Opin. Struct. Biol. 12: 11–13. Chothia, C., Levitt, M., and Richardson, D. 1981. Helix to helix packing in proteins. J. Mol. Biol. 145: 215–250. Cohen, F.E., Richmond, T.J., and Richards, F.M. 1979. Protein folding: Evaluation of some simple rules for the assembly of helices into tertiary structures with myoglobin as an example. J. Mol. Biol. 132: 275–288. Glaser, F., Steinberg, D., Vakser, I.A., and Ben-Tal, N. 2001. Residue frequencies and pairing preferences at protein–protein interfaces. Proteins 43: 89– 102. Halperin, I., Ma, B., Wolfson, H., and Nussinov, R. 2002. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47: 409–443. Hobohm, U. and Sander, C. 1992. Selection of a representative set of structures from the Brookhaven Protein Data Bank. Protein Sci. 1: 409–417. ———. 1994. Enlarged representative set of protein structures. Protein Sci. 3: 522–524. Hubbard, S.J. and Argos, P. 1994. Cavities and packing at protein interfaces. Protein Sci. 3: 2194–2206. Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637. Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A.A., Aflalo, C., and Vakser, I.A. 1992. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. 89: 2195–2199. Keskin, O., Bahar, I., Badretdinov, A.Y., Ptitsyn, O.B., and Jernigan, R.L. 1998. Empirical solvent-mediated potentials hold for both intra-molecular and inter-molecular inter-residue interactions. Protein Sci. 7: 2578–2586. Murzin, A.G. and Finkelstein, A.V. 1988. General architecture of the ␣-helical globule. J. Mol. Biol. 204: 749–769. Nagarajaram, H.A., Reddy, B.V.B., and Blundell, T.L. 1999. Analysis and prediction of inter-strand packing distances between -sheets of globular proteins. Protein Eng. 12: 1055–1062. Ponder, J.W. and Richards, F.M. 1987. Internal packing and protein structural classes. In Cold Spring Harbor symposia on quantitative biology, Vol. LII, pp. 421–428. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Reddy, B.V.B. and Blundell, T.L. 1993. Packing of secondary structure elements in proteins: Analysis and prediction of inter-helix distances. J. Mol. Biol. 233: 464–479. Reddy, B.V.B., Nagarajaram, H.A., and Blundell, T.L. 1999. Analysis of interactive packing of secondary structural elements in ␣/ units in proteins. Protein Sci. 8: 573–586. Richmond, T.J. and Richards, F.M. 1978. Packing of ␣-helices: Geometrical constraints and contact areas. J. Mol. Biol. 119: 537–555. Shoemaker, B.A., Portman, J.J., and Wolynes, P.G. 2000. Speeding molecular recognition by using the folding funnel: The fly-casting mechanism. Proc. Natl. Acad. Sci. 97: 8868–8873. Tovchigrechko, A. and Vakser, I.A. 2001. How common is the funnel-like energy landscape in protein–protein interactions? Protein Sci. 10: 1572– 1583. Tsai, C.-J., Lin, S.L., Wolfson, H.J., and Nussinov, R. 1996. Protein–protein interfaces: Architectures and interactions in protein–protein interfaces and in protein cores: Their similarities and differences. Crit. Rev. Biochem. Mol. Biol. 31: 127–152. Vajda, S., Sippl, M., and Novotny, J. 1997. Empirical potentials and functions for protein folding and binding. Curr. Opin. Struct. Biol. 7: 222–228. Vakser, I.A. 1995. Protein docking for low-resolution structures. Protein Eng. 8: 371–377. ———. 1997. Evaluation of GRAMM low-resolution docking methodology on the hemagglutinin-antibody complex. Proteins 1(Suppl): 226–230. Vakser, I.A. and Aflalo, C. 1994. Hydrophobic docking: A proposed enhancement to molecular recognition techniques. Proteins 20: 320–329. Vakser, I.A. and Jiang, S. 2002. Strategies for modeling the interactions of the transmembrane helices of G protein–coupled receptors by geometric complementarity using the GRAMM computer algorithm. Methods Enzymol. 343: 313–328. Walther, D., Eisenhaber, F., and Argos, P. 1996. Principles of helix–helix packing in proteins: The helical lattice superposition model. J. Mol. Biol. 255: 536–553. Yue, K. and Dill, K.A. 2000. Constraint-based assembly of tertiary protein structures from secondary structure elements. Protein Sci. 9: 1935–1946. www.proteinscience.org 1651
© Copyright 2026 Paperzz