The role of geometric complementarity in secondary structure

The role of geometric complementarity in secondary
structure packing: A systematic docking study
SULIN JIANG,1 ANDREI TOVCHIGRECHKO,2
AND
ILYA A. VAKSER2
1
Department of Biochemistry, Weill Medical College of Cornell University, New York, New York 10021, USA
Bioinformatics Laboratory, Department of Applied Mathematics and Statistics, State University of New York
at Stony Brook, Stony Brook, New York 11794-3600, USA
2
(RECEIVED January 24, 2003; ACCEPTED April 25, 2003)
Abstract
A strong similarity between the major aspects of protein folding and protein recognition is one of the
emerging fundamental principles in protein science. A crucial importance of steric complementarity in
protein recognition is a well-established fact. The goal of this study was to assess the importance of the steric
complementarity in protein folding, namely, in the packing of the secondary structure elements. Although
the tight packing of protein structures, in general, is a well-known fact, a systematic study of the role of
geometric complementarity in the packing of secondary structure elements has been lacking. To assess the
role of the steric complementarity, we used a docking procedure to recreate the crystallographically determined packing of secondary structure elements in known protein structures by using the geometric match
only. The docking results revealed a significant percentage of correctly predicted packing configurations.
Different types of pairs of secondary structure elements showed different degrees of steric complementarity
(from high to low: ␤–␤, loop–loop, ␣–␣, and ␣–␤). Interestingly, the relative contribution of the steric match
in different types of pairs was correlated with the number of such pairs in known protein structures. This
effect may indicate an evolutionary pressure to select tightly packed elements of secondary structure to
maximize the packing of the entire structure. The overall conclusion is that the steric match plays an
essential role in the packing of secondary structure elements. The results are important for better understanding of principles of protein structure and may facilitate development of better methods for protein
structure prediction.
Keywords: Docking; protein modeling; molecular recognition; protein folding; structural bioinformatics
Principles of protein folding and protein recognition are the
basis for understanding life processes at the molecular level.
They also provide the foundation for the algorithms of predicting protein structure and interactions. The principles
that determine the structure of individual proteins and the
structure of protein complexes are amazingly similar (Tsai
et al. 1996; Vajda et al. 1997; Keskin et al. 1998; Shoemaker et al. 2000; Glaser et al. 2001; Baker and Lim 2002).
One of such principles is tight packing of structural eleReprint requests to: Ilya A. Vakser, Bioinformatics Laboratory, Department of Applied Mathematics and Statistics, State University of New York
at Stony Brook, Stony Brook, NY 11794-3600, USA; e-mail: vakser@ams.
sunysb.edu; fax: (631) 632-8490.
Article and publication are at http://www.proteinscience.org/cgi/doi/
10.1110/ps.0304503.
1646
ments inside proteins and at protein–protein interfaces (see
Ponder and Richards 1987; Hubbard and Argos 1994).
The interaction of secondary structure elements in protein
structures may be formulated in terms of docking. The
docking is traditionally considered to be a problem of
matching two separate molecules. The main difference in
matching secondary structure elements and matching separate molecules is in the constraints imposed by the environment. Earlier studies explored the applicability of docking
to secondary structure packing (Ausiello et al. 1997; Yue
and Dill 2000; Vakser and Jiang 2002). A multiplicity of
physicochemical factors obviously plays a role in the packing of secondary structure elements in proteins and in the
formation of protein complexes. However, the well-known
tight packing of structural elements indicates the importance
Protein Science (2003), 12:1646–1651. Published by Cold Spring Harbor Laboratory Press. Copyright © 2003 The Protein Society
Geometric complementarity in structure packing
of the geometric fit. The steric complementarity in protein
interactions has been studied extensively (for a review, see
Halperin et al. 2002). Obviously, the role of the steric
complementarity in the interaction of secondary structure
elements deserves similar attention.
Earlier studies of this subject were primarily focused on
helix–helix packing (see Richmond and Richards 1978; Cohen et al. 1979; Chothia et al. 1981; Murzin and Finkelstein
1988; Reddy and Blundell 1993; Walther et al. 1996). One
of the reasons was the limited number of high-quality crystal structures. Another reason probably was a traditional
biochemical view on interactions of secondary structure elements that largely neglected the geometric complementarity as an important factor (with the exception of helix–helix
interactions). The examples may be found in basic biochemistry texts, which describe the packing of ␤-strands as determined solely by hydrogen bonding. Earlier studies addressed the interaxial distances in ␣–␣, ␤–␤, and ␣–␤ packing with implications to comparative modeling of proteins
(Nagarajaram et al. 1999; Reddy et al. 1999). However, a
systematic study of the role of geometric complementarity
in the packing of secondary structure elements is still lacking. In this article, we present such a study. A docking
algorithm based on geometric complementarity is applied to
a comprehensive database of secondary structure elements
derived from the Protein Data Bank (PDB). The results
show that the steric fit plays an important role in the interaction of the secondary structure elements.
Docking of secondary structure
Database of proteins
The study is based on a comprehensive nonredundant set of
protein structures from PDB. The Pdb㛭select database of
Hobohm and Sander (1992, 1994) was used. It contains
comprehensive sets of protein chains with homologous
structures excluded at a given homology level. We use the
most stringent 25% list, in which no two proteins have
>25% sequence identity. We further removed the NMR
structures and the low-resolution structures (>2.5 Å resolu-
tion) from the list. The remaining 886 protein chains were
used at the next step, for the selection of the secondary
structure elements.
Secondary structure elements
The following procedure was used to select the pairs of
interacting secondary structure elements. For each protein
chain, the Dssp procedure (Kabsch and Sander 1983) was
used to identify the secondary structure elements: ␣-helix,
␤-strand, or a loop. The minimal number of residues was
required to be four in a ␤-strand and three in a loop. Helices
containing less than six residues were excluded to guarantee
the presence of at least two turns. A pair of secondary
structure elements was considered interacting if three or
more residues on each side were in contact. The residues
were considered to be in contact if their C␤ − C␤ distance
(C␣ for Gly) was ⱕ6 Å. Only pairs with a residue contact
ratio (number of residues in contact divided by the total
number of residues in the pair) ⱖ30% were selected for this
study. Table 1 shows the number of pairs of secondary
structure elements in contact.
Docking
Our docking program GRAMM (http://reco3.ams.sunysb.
edu/gramm) was used to match the secondary structure elements in the database. GRAMM performs an exhaustive
grid-based search for surface complementarity of molecules
or molecular fragments. At a given angular orientation of
the molecular structures, all relative translations of the
structures (within the accuracy of the grid) are tested by a
correlation procedure using fast Fourier transformation. The
search is repeated at all angular orientations. The score of a
match in GRAMM is numerically equivalent to the intermolecular atom–atom energy based on a step-function approximation of the Lennard-Jones potential. For each pair of
docked molecular structures, GRAMM outputs a list of lowest-energy matches sorted according to the energy (from
low to high). Lower energy values correspond to a better
geometric fit. GRAMM has been described in detail in a
number of publications (Katchalski-Katzir et al. 1992; Vak-
Table 1. Number of pairs of secondary structure elements
Interface/total surface ratio
Structures
␣–␣
30%
40%
443 (182)
161 (92)
50%
60%
70%
80%
90%
100%
Total
25 (19)
2 (2)
1 (1)
0
0
0
632 (296)
␤–␤
41 (13)
137 (66)
328 (213)
594 (421)
787 (478)
1074 (858)
559 (493)
489 (412)
4009 (2954)
␣–␤
86 (65)
41 (17)
13 (7)
1 (1)
1 (1)
0
0
0
142 (91)
loop–loop
430 (204)
305 (174)
219 (144)
117 (82)
58 (48)
27 (25)
3 (3)
35 (30)
1194 (710)
Total
1000
644
585
714
847
1101
562
524
5977
Numbers in parentheses are the numbers of pairs with correctly predicted matches in 1000 lowest-energy configurations.
www.proteinscience.org
1647
Jiang et al.
Figure 1. Examples of pairs of secondary structure elements. The experimentally observed and the predicted configurations of the pairs show a
high degree of steric complementarity.
tion (RMSD) from the crystal structure of the complex was
calculated. The matches with RMSD ⱕ2.5 Å were considered correct. If there were more than one correct match for
a pair, the lowest-energy match was picked as the final
result for that pair. The numbers in the parentheses in Table
1 are the number of pairs that had correct results within the
1000 lowest-energy matches.
The docking results are shown in Figure 2. The percentage of pairs with a correct match at a given energy rank is
plotted against the energy rank. The results show that the
percentage of the correct matches is strongly correlated with
the energy values. The percentage grows dramatically for
the lowest-energy matches (to a lesser extent for ␣–␤ pairs).
Later, we show that a part of this trend is based on our
selection protocol (selection of a single lowest-energy
match out several correct matches per pair). However, the
observed trend is far stronger that the one introduced by this
bias in the selection of matches. Therefore, because our
energy is based purely on surface complementarity, this
pronounced trend indicates the importance of the surface
match in the interaction of secondary structure elements.
For a further analysis, we separated the pairs into two
subsets, according to the size of the interface. The largeinterface pairs were defined as those with the following
interface/total surface ratio (see Table 1): ⱖ40% for ␣–␣,
ⱖ80% for ␤–␤, ⱖ40% for ␣–␤, and ⱖ50% for loop–loop.
ser and Aflalo 1994; Vakser 1995, 1997) and has been
extensively tested and verified by experiments. GRAMM
allows docking at different resolutions according to the accuracy of the molecular fragments. Because in this study the
docking objects were relatively small high-resolution molecular fragments, the high-resolution docking (grid steps in
the atom-size range) was applied. The docking was performed with 2 Å grid step and 12° angular interval.
Results and Discussion
The goal of this study was to determine the role of the steric
complementarity in interaction of secondary structure elements. A tight geometric match is obvious from the visual
analysis of the secondary structure packing (Fig. 1). However, for an objective evaluation of the role of the steric
match one has to show that (1) it is possible to predict the
correct packing of the secondary structure elements by a
procedure based on the geometric fit alone, and (2) such
predictions are nonrandom. Such computations were conducted for four sets of pairs of secondary structure elements:
␣-helix–␣-helix, ␤-strand–␤-strand, ␣-helix–␤-strand, and
loop–loop (Fig. 1).
Predictions based on surface complementarity
For each of the 1000 lowest-energy matches of a pair of
secondary structure elements, the root mean square devia1648
Protein Science, vol. 12
Figure 2. Percentage of pairs with correctly predicted matches at a given
energy rank. The “energy” is based on surface match only (for details, see
text). The percentage is an average in groups of 20 ranks. The red line
corresponds to all pairs of the specific type (␣-helix–␣-helix, ␤-strand–␤strand, ␣-helix–␤, loop–loop, as indicated on each panel). The green and
the blue lines indicate pairs with small and large interfaces, respectively
(for the definition of small and large interfaces, see text).
Geometric complementarity in structure packing
The rest of the pairs were assigned to the small-interface
subset. The criteria for the large and the small interfaces
were chosen to make the sets approximately equally populated. The results in Figure 2 show that the trend of higher
percentage of lowest-energy matches is naturally stronger
for the larger-surface interfaces.
Significance of the predictions:
Comparison with random models
To prove the significance of the steric complementarity revealed by the results in Figure 2, we compared the actual
docking results with would be random ones. A detailed
study of random models in molecular recognition was published earlier (Tovchigrechko and Vakser 2001). In the present study, we use two models: (1) random docking and (2)
random energy rank.
Model 1: Random docking
In this model, the secondary structure elements are approximated by cylinders (␣-helices and ␤-strands) and
spheres (loops, due to their irregular shapes). The model is
illustrated in Figure 3. To eliminate no-contact cases, we set
the interaxial distance for docked pairs at 10 Å (Reddy and
Figure 3. Model 1: Random docking. (A) ␣–␣, ␤–␤, and ␣–␤ pairs are
represented by cylinders. The length of the cylinders and the distance
between the axes is 10 Å. (B) Loop–loop pairs are represented by spheres
with radii at 10 Å. The dark areas are the range of the correct match. For
better visualization, the coordinates corresponding to only two degrees of
freedom are shown (for details, see text).
Blundell 1993; Nagarajaram et al. 1999; Reddy et al. 1999).
The correct match region was defined as ⱕ2.5 Å from the
crystal structure position (approximately in accordance with
our definition of the correct match in the actual docking).
Such a shift is equivalent to ∼30° change in orientation, out
of possible 360° (Fig. 3A). To avoid no-contact cases, an
up-and-down translation along the axes of the 10-Å cylinders (the minimum length of the secondary structure elements in this study) is possible to a maximum of 10 Å. If the
correct match area is ⱕ2.5 Å in each direction, the chance
of hitting the correct area is one out of four. The correct
match corresponds to the ∼30° spin around the axis of the
second helix and to the ∼30° angle between the axes of the
helices. Combining these four coordinates, we obtain 0.02%
as the upper limit for the probability of the correct match in
this random docking model. This probability of a random
success can be used to estimate the statistical significance of
the actual docking results.
The model of loop–loop docking as a match of two
spherical bodies is shown in Figure 3B. The smallest loop
fragment with three residues has a diameter of ∼10 Å. Thus,
the distance between the centers of the spheres was set as 10
Å. The correct match area for the center of mass of the
second loop is defined as a circle with the 2.5 Å radius
around the position in the crystal structure. This area is
∼1.6% of the entire area of the sphere. For each position of
the center of mass of the second loop, there are different
rotational orientations (e.g., defined by three Euler angles).
If we conservatively consider the average distance from an
atom in this loop to the center of mass of the loop as 2.5 Å,
then the maximal angle of rotation corresponding to the
2.5-Å RMSD must be ∼60°. The number of nonredundant
combinations of three Euler angles forming a grid of 60°
rotations is 79 (the number is taken from GRAMM angular
parameter file; I.A. Vakser, unpubl.). Thus, combining the
probabilities for translational and rotational degrees of freedom, we get 0.02% as the upper limit on the probability of
a random correct prediction for loop–loop docking.
Model 2: Random energy rank
If a docking run for a pair of secondary structure elements
resulted in more than one correct match within 1000 lowestenergy solutions, only one match (the one with the lowest
energy) was included in the statistics in Figure 2. This selection criterion should have resulted in somewhat inflated
percentages of the lowest-energy matches. To reveal the
extent of this artifact, we randomized the energy rank in the
lists of 1000 lowest-energy matches. These new distributions of random-rank matches were compared with the actual distributions (Fig. 4). The actual percentages of the
lowest-energy matches (ranks 1–20) were significantly
larger than the random ones, with the exception of ␣–␤
pairs. A possible reason for the absence of the trend that
goes beyond the random level in Model 2, in the case of
www.proteinscience.org
1649
Jiang et al.
for the top 100 matches, 2%. Interestingly, the probabilities
of correct predictions for different types of pairs (␤–␤,
loop–loop, ␣–␣, ␣–␤) are strongly correlated with the number of pairs available in each case (Table 1). This effect may
represent the evolutionary pressure on protein structures to
select tightly packed elements of secondary structure to
maximize the packing of the entire protein.
Conclusions
Figure 4. Model 2: Random energy rank. Comparison of docking results
with the actual energy rank (black line) and those with the random rank
(gray bars). The percentage is an average in groups of 20 ranks. The
percentage of the matches with the actual energy rank is higher than in
Figure 2 because, due to the nature of this model, only the pairs with
correct predictions were taken into account.
␣–␤ packing, may be the small number of pairs (see Table
1). However, one also has to consider a possibility that the
small number of ␣–␤ pairs in known protein structures may
be the result of their loose packing, which would contribute
unfavorably to a tight packing of protein structures. It is
important to point out that these considerations do not affect
the ability of GRAMM to correctly dock and identify the
low-energy matches based on geometric complementarity
alone, including the ␣–␤ pairs.
A strong similarity between the major aspects of protein
folding and protein recognition is one of the emerging fundamental principles in protein science. A crucial importance
of steric complementarity in protein recognition is a wellestablished fact. The goal of this study was to assess the
importance of the steric complementarity in protein folding,
namely, in the packing of the secondary structure elements.
Although the tight packing of protein structures, in general,
is a well-known fact, a systematic study of the role of geometric complementarity in the packing of secondary structure elements has been lacking. To assess the role of the
steric complementarity, we used a docking procedure to
recreate the crystallographically determined packing of secondary structure elements by using the geometric match
only. The pairs of secondary structure elements (␣-helix–
␣-helix, ␤-strand–␤-strand, ␣-helix–␤-strand, and loop–
loop) were taken from a nonredundant database of known
protein structures. The docking results revealed a significant
percentage of correctly predicted packing configurations.
The significance of the results was verified by comparison
with random docking models. Different types of pairs of
secondary structure elements showed different degrees of
steric complementarity (from high to low: ␤–␤, loop–loop,
Analysis of the correct match probability
The chances of finding the correct match by the geometric
match procedure are shown in Figure 5. The probabilities
are shown for the first 10, 20, 50, and 100 predicted
matches. The highest probabilities are for the ␤–␤ packing,
followed by the loop–loop packing and then by ␣–␣ packing. The lowest probabilities are for the ␣–␤ packing. As
shown above, the results for the ␣–␤ pairs are within the
random range in Model 2. However, they are still significantly higher than the random range in Model 1. If the
probability of a single correct match is 0.02% as in Model
1, then the probability of having at least one correct match
among the top N matches randomly is 1 − [1 − 0.0002]N ≈
0.02% × N. Thus, for the top 10 matches, the probability is
0.2%; for the top 20 matches, 0.4%; for the top 50, 1%, and
1650
Protein Science, vol. 12
Figure 5. Probabilities of the correct match in low-energy predictions. For
comparison, probabilities of the correct match obtained randomly are as
follows: For the top 10 matches, 0.2%; for the top 20 matches, 0.4%; for
the top 50, 1%; and for the top 100 matches, 2% (see text).
Geometric complementarity in structure packing
␣–␣, ␣–␤). Interestingly, the relative contribution of the
steric match in different types of pairs was correlated with
the number of such pairs in known protein structures. This
effect may indicate an evolutionary pressure to select tightly
packed elements of secondary structure to maximize the
packing of the entire structure. The overall conclusion is
that the steric match plays an essential role in the packing of
secondary structure elements. The results are important for
better understanding of principles of protein structure and
may facilitate development of better methods for protein
structure prediction.
Acknowledgments
The study was supported by NIH R01 GM61889 and NSF DBI9808093 grants.
The publication costs of this article were defrayed in part by
payment of page charges. This article must therefore be hereby
marked “advertisement” in accordance with 18 USC section 1734
solely to indicate this fact.
References
Ausiello, G., Cesareni, G., and Helmer-Citterich, M. 1997. ESCHER: A new
docking procedure applied to the reconstruction of protein tertiary structure.
Proteins 28: 556–567.
Baker, D. and Lim, W.A. 2002. Folding and binding: From folding towards
function. Curr. Opin. Struct. Biol. 12: 11–13.
Chothia, C., Levitt, M., and Richardson, D. 1981. Helix to helix packing in
proteins. J. Mol. Biol. 145: 215–250.
Cohen, F.E., Richmond, T.J., and Richards, F.M. 1979. Protein folding: Evaluation of some simple rules for the assembly of helices into tertiary structures
with myoglobin as an example. J. Mol. Biol. 132: 275–288.
Glaser, F., Steinberg, D., Vakser, I.A., and Ben-Tal, N. 2001. Residue frequencies and pairing preferences at protein–protein interfaces. Proteins 43: 89–
102.
Halperin, I., Ma, B., Wolfson, H., and Nussinov, R. 2002. Principles of docking:
An overview of search algorithms and a guide to scoring functions. Proteins
47: 409–443.
Hobohm, U. and Sander, C. 1992. Selection of a representative set of structures
from the Brookhaven Protein Data Bank. Protein Sci. 1: 409–417.
———. 1994. Enlarged representative set of protein structures. Protein Sci. 3:
522–524.
Hubbard, S.J. and Argos, P. 1994. Cavities and packing at protein interfaces.
Protein Sci. 3: 2194–2206.
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure:
Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.
Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A.A., Aflalo, C., and
Vakser, I.A. 1992. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques.
Proc. Natl. Acad. Sci. 89: 2195–2199.
Keskin, O., Bahar, I., Badretdinov, A.Y., Ptitsyn, O.B., and Jernigan, R.L. 1998.
Empirical solvent-mediated potentials hold for both intra-molecular and
inter-molecular inter-residue interactions. Protein Sci. 7: 2578–2586.
Murzin, A.G. and Finkelstein, A.V. 1988. General architecture of the ␣-helical
globule. J. Mol. Biol. 204: 749–769.
Nagarajaram, H.A., Reddy, B.V.B., and Blundell, T.L. 1999. Analysis and
prediction of inter-strand packing distances between ␤-sheets of globular
proteins. Protein Eng. 12: 1055–1062.
Ponder, J.W. and Richards, F.M. 1987. Internal packing and protein structural
classes. In Cold Spring Harbor symposia on quantitative biology, Vol. LII,
pp. 421–428. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.
Reddy, B.V.B. and Blundell, T.L. 1993. Packing of secondary structure elements in proteins: Analysis and prediction of inter-helix distances. J. Mol.
Biol. 233: 464–479.
Reddy, B.V.B., Nagarajaram, H.A., and Blundell, T.L. 1999. Analysis of interactive packing of secondary structural elements in ␣/␤ units in proteins.
Protein Sci. 8: 573–586.
Richmond, T.J. and Richards, F.M. 1978. Packing of ␣-helices: Geometrical
constraints and contact areas. J. Mol. Biol. 119: 537–555.
Shoemaker, B.A., Portman, J.J., and Wolynes, P.G. 2000. Speeding molecular
recognition by using the folding funnel: The fly-casting mechanism. Proc.
Natl. Acad. Sci. 97: 8868–8873.
Tovchigrechko, A. and Vakser, I.A. 2001. How common is the funnel-like
energy landscape in protein–protein interactions? Protein Sci. 10: 1572–
1583.
Tsai, C.-J., Lin, S.L., Wolfson, H.J., and Nussinov, R. 1996. Protein–protein
interfaces: Architectures and interactions in protein–protein interfaces and
in protein cores: Their similarities and differences. Crit. Rev. Biochem. Mol.
Biol. 31: 127–152.
Vajda, S., Sippl, M., and Novotny, J. 1997. Empirical potentials and functions
for protein folding and binding. Curr. Opin. Struct. Biol. 7: 222–228.
Vakser, I.A. 1995. Protein docking for low-resolution structures. Protein Eng.
8: 371–377.
———. 1997. Evaluation of GRAMM low-resolution docking methodology on
the hemagglutinin-antibody complex. Proteins 1(Suppl): 226–230.
Vakser, I.A. and Aflalo, C. 1994. Hydrophobic docking: A proposed enhancement to molecular recognition techniques. Proteins 20: 320–329.
Vakser, I.A. and Jiang, S. 2002. Strategies for modeling the interactions of the
transmembrane helices of G protein–coupled receptors by geometric
complementarity using the GRAMM computer algorithm. Methods Enzymol. 343: 313–328.
Walther, D., Eisenhaber, F., and Argos, P. 1996. Principles of helix–helix packing in proteins: The helical lattice superposition model. J. Mol. Biol. 255:
536–553.
Yue, K. and Dill, K.A. 2000. Constraint-based assembly of tertiary protein
structures from secondary structure elements. Protein Sci. 9: 1935–1946.
www.proteinscience.org
1651