Full-Text PDF

algorithms
Communication
Using Force-Field Grids for Sampling
Translation/Rotation of Partially
Rigid Macromolecules
Mihaly Mezei
Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
[email protected]; Tel.: +1-212-659-5475
Academic Editor: Henning Fernau
Received: 30 October 2016; Accepted: 23 December 2016; Published: 4 January 2017
Abstract: An algorithm is presented for the simulation of two partially flexible macromolecules where
the interaction between the flexible parts and rigid parts is represented by energy grids associated
with the rigid part of each macromolecule. The proposed algorithm avoids the transformation of the
grid upon molecular movement at the expense of the significantly lesser effect of transforming the
flexible part.
Keywords: force-field grid; protein loops; Monte Carlo; partially flexible molecules
1. Introduction
Atomic level simulation of macromolecules is currently performed almost exclusively by
molecular dynamics. Its success, due to it following the actual time course of the system, is also
the source of its ultimate limitation—that of the time scale reachable by it. However, it was argued
earlier that Monte Carlo, the preferred technique at the early times of molecular simulations, has the
potential of overcome this limitation as it is not bound to time [1]. In spite of this potential, there are
few groups working along this line [2–4]. A further advantage of the Monte Carlo approach over
molecular dynamics is the ease of limiting the sampling to certain parts of the system and/or to
a limited set of degrees of freedom.
Representing the field of rigid macromolecules on grids specific to the type of the interacting
atoms is a standard technique employed in many docking softwares [5,6] where the target is held
rigid and a flexible ligand is moved around the target. Assuming that there are NR atoms in the rigid
part, in the energy calculation for each moving atom, the grid representation of the rigid part of the
macromolecule replaces the calculation of NR distances with the calculation of the distances from
the eight nearest gridpoints followed by interpolating the corresponding eight energy terms adding
an effort equivalent to about two more distance calculations. Since the most likely macromolecule type
to be thus modeled is a protein and protein domains have well over a thousand atoms (i.e., NR ' 1000
is a low estimate in most cases), using the grid representation saves a factor of NR /10, that is, likely
a factor of ~100 or more. This idea has also been combined with the Monte Carlo approach for the
simulation of protein loops [7] where only a small part of the molecule is moved during the simulation,
achieving similar speedup. The fact that the rigid part of the macromolecule is also kept in place
was an important aspect for the success of these applications since it allowed keeping the grid(s)
unchanged during the entire simulation. The algorithm described here is designed to maintain the
gains achievable by the use of the grids for the case when a macromolecule is moved.
When modeling the interaction of two partially rigid macromolecules, the straightforward
extension of these algorithms would run into the difficulty of having to move at least one of the
sets of grids. This introduces additional computation expenses: (a) finding the nearest eight gridpoints
Algorithms 2017, 10, 6; doi:10.3390/a10010006
www.mdpi.com/journal/algorithms
Algorithms 2017, 10, 6
2 of 4
would be more involved since the rotated grids would not be aligned with the Cartesian axes and (b) the
additional computational burden of transforming Nt * Ng 3 grid values (with Nt being the number
of grid types and Ng the number of gridpoints along one Cartesian direction) whenever a whole
2017, 10, 6
2 of 5
macromoleculeAlgorithms
is moved.
The proposed algorithm would eliminate these problems at
negligible added
computational would
cost. be more involved since the rotated grids would not be aligned with the Cartesian axes and
2. The
(b) the additional computational burden of transforming Nt * Ng3 grid values (with Nt being the
number of grid types and Ng the number of gridpoints along one Cartesian direction) whenever a
Algorithm
Proposed
whole macromolecule is moved. The proposed algorithm would eliminate these problems at
negligible added computational cost.
This note describes an algorithm that allows the movement of the rigid part while maintaining
2. Theoriginal
Algorithm Proposed
the grids in their
positions, giving rise to the same savings as obtained for the single
molecule system. This note describes an algorithm that allows the movement of the rigid part while maintaining
the grids in their original positions, giving rise to the same savings as obtained for the single
Figure 1 shows
the schematics of the system considered. The rigid parts of the macromolecules are
molecule system.
1 shows
the schematics
system
rigid parts
of the
macromolecules
represented by theFigure
oblongs
labeled
with of
R1theand
R2considered.
and the The
flexible
parts
are
represented by the loops
are represented by the oblongs labeled with R1 and R2 and the flexible parts are represented by the
F1 and F2 . Macromolecule
2 will be held in place with only the F2 part moving while macromolecule 1
loops F1 and F2. Macromolecule 2 will be held in place with only the F2 part moving while
is allowed to translate/rotate
whilstto also
changing
the
conformation
of the part
by the
macromolecule 1 is allowed
translate/rotate
whilst
also
changing the conformation
of therepresented
part
represented
by
the
atoms
forming
F
1.
atoms forming F1 .
F2
R2
R1
F1
Figure 1. Schematics of the system considered. R1 , R2 : rigid part of macromolecules 1 and 2,
Figure 1. Schematics of the system considered. R1, R2: rigid part of macromolecules 1 and 2,
respectively; F1respectively;
, F2 : flexible
part of macromolecules 1 and 2, respectively.
F1, F2: flexible part of macromolecules 1 and 2, respectively.
The key to the algorithm proposed is the simultaneous updating of two equivalent copies of the
The key toflexible
the algorithm
the simultaneous
updating
of two
equivalent
copies of the
parts, labeledproposed
F1′ and F2′ inisFigure
2. The difference between
the primed
and
unprimed
0 and
0 in Figure 2.
versions F
is just
a translation/rotation
of the
rigid
part; their conformations
areprimed
identical. and
The initial
flexible parts, labeled
F
The
difference
between
the
unprimed
versions
1
2
position of macromolecule 1 is R1 and its position after rigid translation/rotation is R1′. Such a
is just a translation/rotation
of also
the require
rigid the
part;
their conformations
identical.
The
initial position of
transformation would
transformation
of grid G1 into G1are
′, as shown
in Figure
2, but
algorithm
avoids this.
Instead,
the translation/rotation
algorithm introduces the alternative
macromoleculethe1 proposed
is R1 and
its position
after
rigid
is R1 0 . position
Such aoftransformation
macromolecule 2, R2′. It is constructed by translating/rotating 0the whole complex in such a way that
would also require
the transformation of grid G into G1 , as shown in
Figure 2, but the proposed
the rigid part of macromolecule 1 is overlaid on 1its initial position/orientation.
Note that, in reality,
algorithm avoids
Instead,of the
algorithm
introduces
the
alternative position of macromolecule 2,
onlythis.
the coordinates
the flexible
part F2 are
involved in the
calculation.
R2 0 . It is constructed by translating/rotating the whole complex in such a way that the rigid part of
macromolecule 1 is overlaid on its initial position/orientation. Note that, in reality, only the coordinates
of the flexible part F2 are involved in the calculation.
Since E(F1 0 ,G1 0 ) = E(F1 ,G1 ) and E(F2 ,G1 0 ) = E(F2 0 ,G1 ) where E(F,G) represents the energy of flexible
part F interacting with grid G, this arrangement allows the calculation of the energy of a new complex
as follows:
E = E(F1 ,G1 ) + E(F2 ,G2 ) + E(F1 0 ,G2 ) + E(F2 0 ,G1 ) + E(F1 0 ,F2 )
(1)
where E(F1 0 ,F2 ) is the energy of the interaction between flexible parts F1 0 and F2 .
Two kinds of changes are considered: translate/rotate macromolecule 1 in the current
conformation of its flexible part, F1 , or change the conformation of the flexible part of one of the
macromolecules, F1 or F2 . When macromolecule 1 is translated/rotated, the coordinates of F1 0 and F2 0
have to be updated. When the flexible part of macromolecule 1 is changed, the coordinates of F1 and
F1 0 have to be updated. When the flexible part of macromolecule 2 is changed, the coordinates of F2
and F2 0 have to be updated.
Note that in Equation (1), the energy of the interaction between the two rigid parts, being separated
by the layers of the flexible part, is assumed to be negligible. For the Van der Waals interactions,
this assumption holds to a good approximation. As for the electrostatic interactions, it is possible to
include them, at negligible computational expense, using a multipole representation of the charge
Algorithms 2017, 10, 6
3 of 4
distribution since the two rigid parts can be enclosed into nonoverlapping spheres, ensuring the
validity
of the multipole representation.
Algorithms 2017, 10, 6
3 of 5
F1’
G 1’
R1’
G1
G2
F2
R2
R1
F1
F2’
R2’
Figure 2. Schematics of the copies needed for the proposed algorithm. R1, R2: rigid part of
Figure 2. Schematics of the copies needed for the proposed algorithm. R1 , R2 : rigid part of
macromolecules 1 and 2, respectively; F1, F2: flexible part of macromolecules 1 and 2, respectively; G1,
macromolecules
1 and 2, respectively; F1 , F2of: flexible
part of macromolecules 1 and 2, respectively;
the rigid parts R1 and R2, respectively; Primed items
G2: The set of grids representing the field
G1 , Grepresent
of systems
grids representing
the
field
of
the
rigid
parts
R1 and R
2 : The setthe
2 , respectively;
after change from the initial position,
orientation
and
conformation.Primed
The twoitems
represent
the systems
change
from
initialincreased
position,fororientation
and conformation. The two
interacting
systemsafter
are shown
with
theirthe
distance
clarity.
interacting systems are shown with their distance increased for clarity.
Since E(F1′,G1′) = E(F1,G1) and E(F2,G1′) = E(F2′,G1) where E(F,G) represents the energy of flexible
part
F interacting with grid G, this arrangement allows the calculation of the energy of a new
3. Discussion
complex as follows:
This note presented an algorithm that extends the use of grid-based force fields to the modeling of
E = E(F1,G1) + E(F2,G2) + E(F1′,G2) + E(F2′,G1)+E(F1′,F2)
(1)
two partially flexible macromolecules by eliminating the added computation involved in determining
where E(F
1′,F2) is the energy
of computation
the interaction to
between
flexible
parts
F1′ and
. the determination of
the nearest
gridpoints,
and the
transform
the
grids.
AsF2for
Two
kinds
of
changes
are
considered:
translate/rotate
macromolecule
the current
the nearest gridpoints, the proposed algorithm’s use of a transformed copy1 isinadding
a similar
conformation of its flexible part, F1, or change the conformation of the flexible part of one of the
computational expense, so there is no net effect. The main saving is achieved by eliminating the
macromolecules, F1 or F2. When macromolecule 1 is translated/rotated, the coordinates of F1′ and F2′
transformation of the grids.
have to be updated. When the flexible part of macromolecule 1 is changed, the coordinates of F1 and
Assuming
that
there are
10 different
corresponding
atom
types, the proposed
F1′ have to be
updated.
When
the flexiblegrids
part of
macromoleculeto2 different
is changed,
the coordinates
of F2
3 points at the negligible expense of transforming
algorithm
would
save
the
transformation
of
~10
×
Ng
and F2′ have to be updated.
the additional
NF in
atoms
of the
part.
Assuming
that
Ng 'the
200
(a rigid
reasonable
value in
Note that
Equation
(1), flexible
the energy
of the
interaction
between
two
parts, being
our experience)
that the
flexible
part
is is
about
10%toofbethe
whole molecule,
about
~8 × 105
separated by and
the layers
of the
flexible
part,
assumed
negligible.
For the Van
der Waals
2 additional
interactions, this
holds
a good
approximation.
electrostatic
interactions,
it is
transformations
canassumption
be eliminated
attothe
negligible
expenseAs
offor
~2the
× 10
transformations
possible
to
include
them,
at
negligible
computational
expense,
using
a
multipole
representation
of to
of the flexible part. The estimate of the overall effect is less straightforward. It is reasonable
the
charge
distribution
since
the
two
rigid
parts
can
be
enclosed
into
nonoverlapping
spheres,
assume that the volume of the grid is proportional to the volume of the flexible atoms or, equivalently,
ensuring the validity of the multipole representation.
that the number of grids along an axis is proportional to the cube root of the number of flexible
atoms,
i.e., Ng = c1 × NF 1/3 , and that the overall energy calculation will be dominated by the direct
3. Discussion
interactions of the flexible parts. Assuming further, as is generally true for Markov-chain Monte
This note presented an algorithm that extends the use of grid-based force fields to the modeling
Carlo, that a move involves a small limited fraction of the atoms available for moving, the work
of two partially flexible macromolecules by eliminating the added computation involved in
in calculating the energy change is c2 × NF. This means that the ratio of the work involved in the
determining the nearest gridpoints, and the computation to transform the grids. As for the
energy
calculation of
perthe
step
and the
corresponding
savings
by usinguse
theofalgorithm
proposed
is
determination
nearest
gridpoints,
the proposed
algorithm’s
a transformed
copy here
is
1/3 )3 ] = c /(10 × W × c 3 ) where W is the fraction of the time during
[c2 ×adding
(NF )]/[W
×
10
×
(c
×
N
1
F
2
1
a similar computational expense, so there is no net effect. The main saving is achieved by
whicheliminating
the wholethe
molecule
is moved.
Thus,
in the limit of a large number of flexible atoms, the expense
transformation
of the
grids.
of transforming
the that
gridthere
is commensurable
thecorresponding
rest of the energy
calculation,
thus the
limit of
Assuming
are 10 different to
grids
to different
atom types,
the upper
proposed
3 points at the negligible expense of
algorithm
save
the transformation
of ~10
the savings
is a would
constant,
depending,
among others,
on ×theNg
frequency
of whole molecule moves. Likely
transforming
the additional
NF atoms
of the
flexible part.
Assuming
thatN
Ng
≃ 200 (a reasonable
applications,
however,
are far from
this limit:
assuming
(a relatively
large)
F ' 100 and Ng ' 200 and
thus the upper limit of the savings is a constant, depending, among others, on the frequency of
whole molecule moves. Likely applications, however, are far from this limit: assuming (a relatively
large) NF ≃ 100 and Ng ≃ 200 and the number of changed flexible atoms ~50 (an overestimate), the
ratio of Algorithms
the energy
calculation to the grid transformation calculation would be (50 × 100)/(W
× 10 ×
2017, 10, 6
4 of 4
3
−4
200 ) = 10 (5/(W × 8) × F where F is the ratio of the computation expenses of calculating a distance
and transforming a coordinate. Clearly, it shows a significant gain for systems of systems of likely
the number of changed flexible atoms ~50 (an overestimate), the ratio of the energy calculation to the
size.
grid transformation calculation would be (50 × 100)/(W × 10 × 2003 ) = 10 −4 (5/(W × 8) × F where
Thus,
expected
that this algorithm
significantly
speed
up the modeling
of protein
F is it
theisratio
of the computation
expenses ofwould
calculating
a distance and
transforming
a coordinate.
Clearly,
it shows
significant gain
for systems
of systems
of likely
size.
complexes
where
the ainteraction
region
is relatively
small
compared
to the overall volume of the
Thus,
it
is
expected
that
this
algorithm
would
significantly
speed
up
the
modeling It
of protein
complex and foster further developments in the Monte Carlo methodology.
is planned to
complexes where the interaction region is relatively small compared to the overall volume of the
implement it into the program called MMC [4] that already includes the single grid version. A
complex and foster further developments in the Monte Carlo methodology. It is planned to implement
typical example
for its use
would
complexes
(MHC) [8]. An
it into the program
called
MMC be
[4] modeling
that already major
includeshistocompatibility
the single grid version.
A typical example
examplefor
foritssuch
a system
is shown
in Figure
3.
use would
be modeling
major
histocompatibility
complexes (MHC) [8]. An example for such
a system is shown in Figure 3.
Figure 3. Crystal structure of MHC class I HLA-A2.1 bound to a photocleavable peptide (PDB ID:
2X70).
The possible
choicesofof flexible
to F1bound
and F2 are
in red and blue
3. Crystal
structure
MHC parts
classcorresponding
I HLA-A2.1
to shown
a photocleavable
spheres, respectively.
Figure
peptide
(PDB ID: 2X70). The possible choices of flexible parts corresponding to F1 and F2 are shown in red
and blue spheres, respectively.
Acknowledgments: This work was supported in part through the computational resources and staff expertise
provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.
Acknowledgments: This work was supported in part through the computational resources and staff expertise
Conflicts of Interest: The author declares no conflict of interest.
provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.
ConflictsReferences
of Interest: The author declares no conflict of interest.
Mezei, M. On the potential of Monte Carlo methods for simulating macromolecular assemblies. In Third
References International Workshop for Methods for Macromolecular Modeling Conference Proceedings; Gan, H.H., Schlick, T.,
Eds.; Springer: New York, NY, USA, 2002.
1.
Mezei,
M.
On theW.L.;
potential
of Monte
Carlo modeling
methodsoffor
simulating
macromolecular
assemblies.
2.
Jorgensen,
Tirado-Rives,
J. Molecular
organic
and biomolecular
systems using
BOSS and In Third
MCPRO.
J. Comput.
2005,for
26,Macromolecular
1689–1700. [CrossRef]
[PubMed]
International
Workshop
for Chem.
Methods
Modeling
Conference Proceedings; Gan, H.H., Schlick, T.,
3.
Ullmann,
R.T.;
Ullmann,
G.M.
GMCT:
A
Monte
Carlo
simulation
package for macromolecular receptors.
Eds; Springer: New York, NY, USA, 2002.
J. Comput. Chem. 2012, 33, 887–900. [CrossRef] [PubMed]
2.
Jorgensen, W.L.; Tirado-Rives, J. Molecular modeling of organic and biomolecular systems using BOSS
Mezei, M. MMC: Monte Carlo Program for Molecular Assemblies. Available online: http://inka.mssm.edu/
4.
and MCPRO.
J. Comput.
Chem.
2005,
26, 1689–1700.
~mezei/mmc
(accessed
on 27
December
2016).
5.
Goodford, P.J. A computational procedure for determining energetically favorable binding sites on
biologically important macromolecules. J. Med. Chem. 1985, 28, 849–857. [CrossRef] [PubMed]
Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular docking: A powerful approach for structure-based
6.
drug discovery. Curr. Comput. Aided Drug Des. 2011, 7, 146–157. [CrossRef] [PubMed]
7.
Cui, M.; Mezei, M.; Osman, R. Prediction of protein loop structures using a local move Monte Carlo approach
and a grid-based force field. Protein Eng. Des. Sel. 2008, 21, 729–735. [CrossRef] [PubMed]
8.
Janeway, C.A., Jr.; Travers, P.; Mark, W.; Mark, J.S. Immunobiology: The Immune System in Health and Disease,
5th ed.; Garland Science: New York, NY, USA, 2001.
1.
© 2017 by the author; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).