Template for Electronic Submission to ACS Journals

Statistical Torsion Angle Potential (STAP) Energy Functions
for Protein Structure Modeling: A Bicubic Interpolation
Approach
Tae-Rae Kim1, Joshua SungWoo Yang2,3, Seokmin Shin1, and Jinhyuk Lee2,3*
1
Department of Chemistry, Seoul National University, Seoul 151-747, the Republic of Korea
2
Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, 125
Gwahak-ro, Yuseong-gu, Daejeon 305-806, the Republic of Korea
3
Department of Bioinformatics, University of Science and Technology, 217 Gajung-ro, Yuseong-Gu, Daejeon
305-350, the Republic of Korea
Keywords: torsion angle, statistical potential, structure modeling, bicubic interpolation
*
Corresponding
[email protected]
author:
Dr.
Lee
at
Tel:
+82-42-879-8530;
Fax:
+82-42-879-8519;
E-mail:
To be addressed in this Supporting Information are as following: the justification of the assumption of
Boltzmann distribution of torsion angles, illustrations of the potential energy functions, the full list of test sets,
and the full graphs used for discussions. The structures modeled under the FF22 and the CMAP conditions are
also visualized for comparison. The effect of calibration by removal of torsion energy terms in the CHARMM22
force field, although not an exact calibration, is also presented.
A. Justification of the statistical potential energy function
It is assumed that the torsion angle data from crystallized protein molecules are in the same statistical ensemble,
the crystal ensemble. They can be seen as frozen molecules near absolute-zero temperature because the
observed and averaged torsion angle is near to the position of energy minimum. High-resolution X-ray
structures are obtained by suppressing thermal motions to their ground vibration state. Of course, there are
excited vibration modes even under the condition, but inter-nuclear energy curves are generally nearly
symmetric and the averaged value for torsion angles will not change, infinitesimally if any.
Though the detailed torsion angle values at room temperature will be slightly different from those at low
temperature, its effect on the potential energy functions will be negligible. The vibration modes are increasingly
agitated as temperature goes up. As the inter-atomic potential energy curve is not perfectly symmetric, the
averaged nuclear positions at room temperature, where protein crystals were made, will be different from those
at cryogenic conditions. However, the dihedral angles are less sensitive to the change of distances, and the
torsion angle variation will be much smaller than the grid size of 15 degrees. In the data set of the 18,352
proteins, such variations will be safely neutralized. The possibility of large structural change—from trans to
gauche+ conformation, for example—by solid-solid phase transition at low temperature is also neglected in
finite time scale.
As the torsion angles were considered to be at zero temperature, the energy value and free energy value
coincide. Systematic multiplicity can give residual entropy as in carbon monoxide crystal, but the use of highresolution X-ray structures safely removed its possibility. The coincidence enables the mapping of populations
onto energy values, not free energy values.
The reasoning can be summarized as the following. The observed torsion angle values are virtually the same to
those at the time of crystallization. The small difference of torsion angles, maybe of a few tenth degrees, will be
neutralized in large protein sets and buried in much larger grid interval.
B. Illustration of the potential energy functions
Figure S1. Illustration of the STAP functions for isoleucine residue (ILE) as an example. From upper left corner,
the functions for -, -1, -1, and 1-2 combination are shown in clockwise manner.
C. Full list of target list
Total 55 of CASP7 targets used are as following: T0286, T0288, T0289, T0290, T0291, T0292, T0293, T0294,
T0295, T0297, T0298, T0302, T0303, T0305, T0308, T0310, T0313, T0315, T0316, T0317, T0318, T0320,
T0322, T0323, T0324, T0326, T0328, T0329, T0330, T0332, T0333, T0334, T0337, T0338, T0339, T0340,
T0341, T0342, T0345, T0346, T0359, T0362, T0364, T0366, T0371, T0373, T0374, T0375, T0376, T0378,
T0379, T0380, T0381, T0384, T0386.
Total 19 of CASP9 targets used are as following: T0531, T0538, T0539, T0541, T0544, T0545, T0551, T0552,
T0553, T0555, T0557, T0559, T0560, T0562, T0564, T0569, T0572, T0579, T0590.
D. Graphs
Figure S2. The comparison between the STAP, the FC-1, and the FC-2 conditions. The blue, red, and green
plots represent the results using the STAP, the FC-1, and the FC-2 conditions, respectively.
Figure S3. The Comparison between the STAP, the FC-1, the FC-3, and the FC-4 conditions. The blue, red,
green, and purple plots represent the results using the STAP, the FC-1, the FC-3, and the FC-4 condition,
respectively.
E. A visualization on the effect of the STAP
A CASP7 target T0286 is modeled either under the FF22 and the STAP conditions and the model structures are
illustrated. The lowest-energy structures from 50,000-step dynamics simulations are chosen. Molprobity web
server analyzed the structures and generated the Figures S4 and S5.
Figure S4. The structure under FF22 condition, or in the absence of the STAP. Purple dots, green lines, and
orange side chains denote close atom contacts, Ramachandran outliers, and rotamer outliers, respectively.
Figure S5. The structure under STAP condition, or in the presence of the STAP. Purple dots, green lines, and
orange side chains denote close atom contacts, Ramachandran outliers, and rotamer outliers, respectively.
F. The effect of calibration on the STAP
As already mentioned in the text, the STAP in the form of correction term is more appropriate to be used along
with other force field terms. In order to make the correction terms, the energy landscape made by other force
field terms should be evaluated, then the values at grid points should be subtracted from the grid-type STAP
functions. The evaluation could have been done using dipeptides or tripeptides as performed in reference 5
(MacKerell, A. D., Jr., Feig, M., and Brooks, C. L., III, J. Am. Chem. Soc. 126(3), 698–699 (2004)), but the task
has not been done in this study. Only – combination is handled in MacKerell et al., whereas three other ones
(–1, –1, and 1–2) should also be treated in this study.
The simplest way of calibration is to ignore all of the force field terms related to the bivariate energy landscape:
the electrostatic, torsion, and van der Waals terms in the CHARMM22 force field. The electrostatic term and
torsion angle terms can be turned off, but the van der Waals terms cannot—the remedy will cause severe atomic
clashes. In the text, calibration was not tried and torsion terms are still evaluated.
On request of a reviewer, the effect of removing dihedral terms is tried, although this remedy is not an exact
calibration. When the dihedral terms are turned off, clash and rotamer scores improve (Table S1). The relatively
high clash score is main drawback of the with-dihedrals condition (compare the weighted sum scores inside and
outside parentheses). The reduced clash score under without-dihedrals condition implies that stronger functions
lead to the relatively high clash scores, which are also problems in the results in manuscript. This suggests that
calibration could improve the overall result. This improvement shall be done in later work.
Before calibration: with dihedrals
After calibration: without dihedrals
2.485 (3.051)
2.775 (2.714)
TM-score
0.768
0.767
nDOPE
-0.509
-0.386
dDFIRE
-472.386
-450.942
14.70
8.80
Weighted sum score
Molprobity clash
Molprobity RAMA
97.05
97.21
PROCHECK RAMA
92.84
92.67
WHAT_CHECK RAMA
2.533
2.156
WHAT_CHECK 1st packing
-1.829
-2.226
WHAT_CHECK 2nd packing
-2.515
-2.800
WHAT_CHECK Rotamer
2.836
3.263
1 accuracy
0.4505
0.4485
1+2 accuracy
0.2969
0.2982
Table S1. An attempt to skip dihedral terms in modeling process. The weighted summations in parentheses are
evaluated without considering clash scores (this legend is different from those for the Tables III, V, VI, and VII).