Statistical Torsion Angle Potential (STAP) Energy Functions for Protein Structure Modeling: A Bicubic Interpolation Approach Tae-Rae Kim1, Joshua SungWoo Yang2,3, Seokmin Shin1, and Jinhyuk Lee2,3* 1 Department of Chemistry, Seoul National University, Seoul 151-747, the Republic of Korea 2 Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon 305-806, the Republic of Korea 3 Department of Bioinformatics, University of Science and Technology, 217 Gajung-ro, Yuseong-Gu, Daejeon 305-350, the Republic of Korea Keywords: torsion angle, statistical potential, structure modeling, bicubic interpolation * Corresponding [email protected] author: Dr. Lee at Tel: +82-42-879-8530; Fax: +82-42-879-8519; E-mail: To be addressed in this Supporting Information are as following: the justification of the assumption of Boltzmann distribution of torsion angles, illustrations of the potential energy functions, the full list of test sets, and the full graphs used for discussions. The structures modeled under the FF22 and the CMAP conditions are also visualized for comparison. The effect of calibration by removal of torsion energy terms in the CHARMM22 force field, although not an exact calibration, is also presented. A. Justification of the statistical potential energy function It is assumed that the torsion angle data from crystallized protein molecules are in the same statistical ensemble, the crystal ensemble. They can be seen as frozen molecules near absolute-zero temperature because the observed and averaged torsion angle is near to the position of energy minimum. High-resolution X-ray structures are obtained by suppressing thermal motions to their ground vibration state. Of course, there are excited vibration modes even under the condition, but inter-nuclear energy curves are generally nearly symmetric and the averaged value for torsion angles will not change, infinitesimally if any. Though the detailed torsion angle values at room temperature will be slightly different from those at low temperature, its effect on the potential energy functions will be negligible. The vibration modes are increasingly agitated as temperature goes up. As the inter-atomic potential energy curve is not perfectly symmetric, the averaged nuclear positions at room temperature, where protein crystals were made, will be different from those at cryogenic conditions. However, the dihedral angles are less sensitive to the change of distances, and the torsion angle variation will be much smaller than the grid size of 15 degrees. In the data set of the 18,352 proteins, such variations will be safely neutralized. The possibility of large structural change—from trans to gauche+ conformation, for example—by solid-solid phase transition at low temperature is also neglected in finite time scale. As the torsion angles were considered to be at zero temperature, the energy value and free energy value coincide. Systematic multiplicity can give residual entropy as in carbon monoxide crystal, but the use of highresolution X-ray structures safely removed its possibility. The coincidence enables the mapping of populations onto energy values, not free energy values. The reasoning can be summarized as the following. The observed torsion angle values are virtually the same to those at the time of crystallization. The small difference of torsion angles, maybe of a few tenth degrees, will be neutralized in large protein sets and buried in much larger grid interval. B. Illustration of the potential energy functions Figure S1. Illustration of the STAP functions for isoleucine residue (ILE) as an example. From upper left corner, the functions for -, -1, -1, and 1-2 combination are shown in clockwise manner. C. Full list of target list Total 55 of CASP7 targets used are as following: T0286, T0288, T0289, T0290, T0291, T0292, T0293, T0294, T0295, T0297, T0298, T0302, T0303, T0305, T0308, T0310, T0313, T0315, T0316, T0317, T0318, T0320, T0322, T0323, T0324, T0326, T0328, T0329, T0330, T0332, T0333, T0334, T0337, T0338, T0339, T0340, T0341, T0342, T0345, T0346, T0359, T0362, T0364, T0366, T0371, T0373, T0374, T0375, T0376, T0378, T0379, T0380, T0381, T0384, T0386. Total 19 of CASP9 targets used are as following: T0531, T0538, T0539, T0541, T0544, T0545, T0551, T0552, T0553, T0555, T0557, T0559, T0560, T0562, T0564, T0569, T0572, T0579, T0590. D. Graphs Figure S2. The comparison between the STAP, the FC-1, and the FC-2 conditions. The blue, red, and green plots represent the results using the STAP, the FC-1, and the FC-2 conditions, respectively. Figure S3. The Comparison between the STAP, the FC-1, the FC-3, and the FC-4 conditions. The blue, red, green, and purple plots represent the results using the STAP, the FC-1, the FC-3, and the FC-4 condition, respectively. E. A visualization on the effect of the STAP A CASP7 target T0286 is modeled either under the FF22 and the STAP conditions and the model structures are illustrated. The lowest-energy structures from 50,000-step dynamics simulations are chosen. Molprobity web server analyzed the structures and generated the Figures S4 and S5. Figure S4. The structure under FF22 condition, or in the absence of the STAP. Purple dots, green lines, and orange side chains denote close atom contacts, Ramachandran outliers, and rotamer outliers, respectively. Figure S5. The structure under STAP condition, or in the presence of the STAP. Purple dots, green lines, and orange side chains denote close atom contacts, Ramachandran outliers, and rotamer outliers, respectively. F. The effect of calibration on the STAP As already mentioned in the text, the STAP in the form of correction term is more appropriate to be used along with other force field terms. In order to make the correction terms, the energy landscape made by other force field terms should be evaluated, then the values at grid points should be subtracted from the grid-type STAP functions. The evaluation could have been done using dipeptides or tripeptides as performed in reference 5 (MacKerell, A. D., Jr., Feig, M., and Brooks, C. L., III, J. Am. Chem. Soc. 126(3), 698–699 (2004)), but the task has not been done in this study. Only – combination is handled in MacKerell et al., whereas three other ones (–1, –1, and 1–2) should also be treated in this study. The simplest way of calibration is to ignore all of the force field terms related to the bivariate energy landscape: the electrostatic, torsion, and van der Waals terms in the CHARMM22 force field. The electrostatic term and torsion angle terms can be turned off, but the van der Waals terms cannot—the remedy will cause severe atomic clashes. In the text, calibration was not tried and torsion terms are still evaluated. On request of a reviewer, the effect of removing dihedral terms is tried, although this remedy is not an exact calibration. When the dihedral terms are turned off, clash and rotamer scores improve (Table S1). The relatively high clash score is main drawback of the with-dihedrals condition (compare the weighted sum scores inside and outside parentheses). The reduced clash score under without-dihedrals condition implies that stronger functions lead to the relatively high clash scores, which are also problems in the results in manuscript. This suggests that calibration could improve the overall result. This improvement shall be done in later work. Before calibration: with dihedrals After calibration: without dihedrals 2.485 (3.051) 2.775 (2.714) TM-score 0.768 0.767 nDOPE -0.509 -0.386 dDFIRE -472.386 -450.942 14.70 8.80 Weighted sum score Molprobity clash Molprobity RAMA 97.05 97.21 PROCHECK RAMA 92.84 92.67 WHAT_CHECK RAMA 2.533 2.156 WHAT_CHECK 1st packing -1.829 -2.226 WHAT_CHECK 2nd packing -2.515 -2.800 WHAT_CHECK Rotamer 2.836 3.263 1 accuracy 0.4505 0.4485 1+2 accuracy 0.2969 0.2982 Table S1. An attempt to skip dihedral terms in modeling process. The weighted summations in parentheses are evaluated without considering clash scores (this legend is different from those for the Tables III, V, VI, and VII).
© Copyright 2026 Paperzz