Getting Started with MD

A tutorial for preparing MOIL simulation from a PDB file
A user blog
Here is a not-so-quick introduction of how we can pick a protein structure from the PDB
(Protein Data Bank), and process it into a useful form for moil calculation.
It is assumed that moil is installed. [Most users will have installed pre-built versions of
MOIL from http://clsb.ices.utexas.edu/prebuilt/, in which case the following reference to
compilation of source code is not necessary]. We will allow a few glitches in installation
as described below, but the starting point is the existence of a moil directory and a
successful compilation of the source code with one of the make_distribution files
appropriate for your computer, (or by manual copy of the relevant exe/binary files to the
moil/moil.source/exe directory). If you are not there yet (e.g. the directory
moil/moil.source/exe is empty, in Windows it is the directory moil/moil.exe for the
moment) go to https://wiki.ices.utexas.edu/clsb/wiki and read the text under the links
GetPackagedMoil GetSubversionMoil and BuildingMoil and follow the instructions.
We will look into hydrogen building, energy calculations, and construction of a solvation
box. We will NOT look on some unique computational tools that were introduced by the
MOIL team such as LES or reaction path calculations. These blogs may or may not come
later.
Ok. Here we go. Below I type between “…” commands that are typed with the keyboard
and/or into specific text windows. Mouse clicks will be described with italics. Computer
text is either with “…” or underline.
We start by selecting a protein structure form the PDB. I am addicted to Firefox, but
similar procedure is expected to apply to Explorer/Safari or whatever is your favorite
browser. Type in the url of the PDB “http://www.rcsb.org”. Once the web site is loaded,
type in the search box (select PDB ID or keyword) “mbco”. This search is followed by a
list of possible hits.
Among the many myoglobin structures found, the entry 2bw9 we shall use. There are
several icons near the 2BW9 text. Click on the icon to download the file 2bw9.pdb and
move the file to the directory you wish to use for this project (on my home Mac it is the
directory “~/work/get_started”).
The first thing we want to try is to visually inspect the structure. We will use zmoil for
that purpose.
Note: the following description instructs you on the use of the MOIL graphical user
interface as a means to launch the molecular-viewer application Zmoil. However, Zmoil
may also be launched independently for viewing standard file formats such as PDB. See
the Zmoil documentation for more information.
1
[Windows users may launch MOIL from the Desktop shortcut and/or the Start Menu after
having installed MOIL. In the following discussion which refers to running moil from
the command line, you may simply run MOIL as usual from your icon, and when
opening files within MOIL, simply browse to the folder containing the desired file.]
We go to the directory with the PDB file “ cd ~/work/get_started”, then type “moil.tcl”
(moil requires tcl/tk installed on your computer, unless you are using the Windows
version of moil). It is assumed that you added to your path the directory ~/moil/moil.gui
. If not you will get the message moil.tcl not found (just what happened to me now). So
being lazy I try the most straightforward approach and type the whole path
~/moil/moil.gui/moil.tcl (for future reference you probably should edit your .cshrc or
your equivalent default shell startup file to add ~/moil/moil.gui to your path). [Windows
users may also add the moil.gui folder to their path if they desire to run MOIL from the
command line. Windows users should simply type “moil” rather than “moil.tcl” to
launch MOIL from the command line.]
Success!! The moil.tcl window appeared (shown below) with top bar with a few options
to choose from and so-and-so picture of the gramicidin ion channel. At the bottom of the
moil window there are links to our site and e-mail. You can use these links to read more
on the research conducted with moil and on its development. You can also e-mail us
about bugs which we will happily hunt and respond as quickly as possible. Note however
that this is free code and we are not paid to guide potential users through the application
code. You may wish to contact Ron Elber [email protected] if you need extensive
support.
2
Ok. Let us first look at the PDB structure that we just downloaded. Click on View
Structure and select New Config File. A new page appears with a lot of text boxes to fill
(shown below). Do not forget your credit card number and billing address (just
kidding…). You can click at the right lower corner of the page and drag the mouse to
enlarge (or decrease) its size. The default size is rarely optimal.
To start we will make only minimal selections. Erase the text “.wcon” from the entry on
the top (the “Connectivity File”). Go to the line below (marked Coordinate File) and click
on the right side of the same line on Browse. Drag the window with the mouse clicked at
the lower right corner if some of the text is hidden. Highlight (click on with the mouse)
the file 2bw9.pdb and then click ok. Choose how the protein will be drawn by selecting
options next to the line Display Mode: choose Stick, and to add some flavor to the
picture click also on the button on the left of Secondary Structure.
3
Use the scroll bar at the right side of the window to scroll to the end of the page. There
are couple of action buttons below. DO NOT click on Quit, and try instead clicking on
View (zmoil).
The dialog that first appears gives a crash course on the function of the mouse in zmoil. It
is useful to know that zmoil works best with a mouse with at least three buttons. The left
is to rotate the molecule, the right is to translate it, and the middle to zoom. The middle
and right buttons can also used to scroll the UI panels, such as the long panel on the left:
moving your mouse over the panel and either scrolling with the middle mouse wheel or
right-click dragging will allow you to expose additional UI. On the MAC mouse (one
button only) a click is a left click, and a right click is CTRL-click. You may select not to
see this message ever again by clicking on the left of the text Do not show this message
again. Otherwise just click ok at the bottom of this message and move on to the next
paragraph.
By now you should have a picture of the screen as above (if not, here is your opportunity
to write your first bug report!). Note the extensive clickable options on your left. I will let
you explore the visualization options on your own. Some of the more advanced features
include printing an image, measuring a torsion angle and beautifying the picture. If you
scroll down the instruction windows on the left (via mouse-wheel or right-click dragging)
you will discover a submenu called Surface Editor. On the right side of the surface editor
you will find a button Display surface. Click on it. It will take a few seconds to compute
the surface and then it will be displayed as covering half of the molecule. Rotate the
molecule with the left mouse button to have a better appreciation for the display surface.
4
Close zmoil (by pressing ESC or clicking the window-close icon) when you are satisfied.
Let us now try to process the PDB file and prepare the necessary input file for simulation.
Go to the upper left corner of the main moil window and click on assemble and select
process PDB. Another worksheet shows up (as shown below). Note that the lower left
part of the new page there exists a clickable spot Use default Parameters. The intention is
to reduce complexity when it is not necessary. We will first use the simpler procedure,
however, we will need to use some extra parameters (and complexity) later (the image
below is produced when you click the Default Parameters button):
The first active line of the new page starts on the left with PDB file. Click on the right
side of the same line on Browse. A file selection window appears. Select the PDB file of
interest by highlighting it with the mouse 2BW9.pdb (or whatever slight variation your
file may be named) and then click on OK. The file selection window disappears, and new
text appears in various entries in the Process PDB worksheet. After you pick your PDB
5
file, necessary files are assigned default names. For example, your molecule is now
named 2BW9. You can change those file names if you wish but you do not have to, and it
is a good practice to leave them as is for easy tracking in the future. Go now to the
bottom of the page and click on Run Locally.
If you expect to receive a message of the type: “All is well, continue to next level” you
are a true optimist. Unfortunately working with MOIL requires you to be a realist. As a
rule of a thumb the conversion from a PDB to MOIL internal format and the generation
of the appropriate file for energy calculation never works the first time, and the present
case is not different (sometimes it does not work even the fourth time, but as long as
progress is made do not despair).
The messages we receive this time are divided between two windows. The more serious
problem is also the more threatening picture (with the hand signaling you to stop). The
message is “CMO” not in /Users/ron/moil/moil.gui/defaults/ALL.MONO. A brief
explanation on the way MOIL interprets PDB files is now in place if we are to overcome
this particular barrier.
This tutorial was written in September 2008. Since that time, CMO has been added to
the MOIL monomer database, so the “CMO” problem described will not occur. The
following explanation is left here for reference, since similar problems are likely to be
encountered by MOIL users. Since CMO is in the MOIL monomer database, you will
not need to edit the PDB file as directed below.
MOIL reads the residues (or “monomers” in MOIL’s lexicon) from the PDB file and tries
to match these monomers with monomers that already exist in the MOIL database. The
latter are fully parameterized and characterized. The database of MOIL is quite extensive
and includes (for example) all the amino acids, water models, models of nucleotides,
ions, and a few small molecules that serve as ligands. You may wish to have a peek at
this file by clicking on edit in the line starting with Monomer File in the Processing PDB
working sheet (DO NOT change it). However, it is obviously not complete. The program
relies on the monomer information to know which atom is connected to which, how
strongly, etc. If no such match is made, there is a problem. It is not possible to define an
energy function for a group of unidentified atoms.
In the PDB file at hand, there exists the monomer CMO. Do you know what it is? Neither
did I! And of course MOIL is clueless as well. MOIL is trying its best and the message
about CMO is (translated to English) – “Sorry, I cannot continue, however, if you insist
the best I can do is to remove this offending residue and work with the rest”. You may
answer “yes”, then the program will remove the CMO and you can forget about it ever
after. However, is this what you want? Perhaps CMO is where the most interesting action
takes place.
Actually the mysterious CMO is important. It stands for Carbon MonOxide, which is the
ligand we wish to follow and investigate its diffusion. So we better keep it. It is time to
6
stop following the automaton and to use superior human intelligence. Click on Stop. Not
surprisingly the next message is Could not process PDB file… Click close.
We now have two windows (a) *_pdb.log (the * is for wild card) and (b) Processing
PDB. We do not need (a) to continue but it is a good point to explain what it is all about.
In the *.log we see many lines that look like
Dropped: ATOM
239 CD1BTRP M 29
37.474 12.072 15.406 0.50 30.77
This is a copy of a line from the PDB of an atom CD1 from the residue TRP. The atom
CD1 got an extension to CD1B since the side chain has multiple conformations and in the
large number of molecules in the crystal (Avogadro or some such) some are found in the
crystal at alternate conformations. The multiple conformations add up to a more complex
spectrum that can still be interpreted by having TRP in more than one conformation.
Obviously a single molecular model will have only one side chain for a particular amino
acid (actually the LES code of MOIL allows you to retain more than copy of a side chain,
but this is another story). At present MOIL makes the arbitrary choice and keeps only the
first conformation listed in the file. The atom of TRP that will be kept is CD1A while
CD1B etc will be dropped. You may have a different opinion which atoms are needed. If
you do, you will need to edit the PDB file yourself and keep the conformation close to
your heart. At present MOIL does not allow for choices of side chain conformations
through the graphic interface. In the present exercise we will let MOIL choose and
therefore the output of *.log is not alarming.
We need to do something about CMO. Lucky for us there is a simple solution. Carbon
monoxide “monomer” already exists in ALL.MONO. However, it is not called CMO but
CO instead. If we slightly edit the PDB file we should be all set. Let us look first at the
definition of CO in moil.
Click at the edit at the right side of the Monomer File line in the Processing PDB page
(note that if you double click on Monomer File a new window will appear with a short
explanation what the monomer file is all about). A new page opened with three internal
windows titled Monomers, Particles, and Bonds as shown below. If you scroll the sidebar
of Monomers you will find CO close to the bottom. Click on CO. In the other windows
you will “discover” that CO consists of two particles named C and O (the particle types
are CM and OM) and the C is bonded to the O. You will also note CMO, newly added
since this was originally written, lives just next to CO in the Monomer list. If you click
on it, you will see it is identically defined as CO.
7
Particle types are used to define the physical and chemical properties of an atom (or
particle). For example the carbon and the hydrogen attached to it in a benzene ring are
one atom type (say CH). All atoms in Benzene are of the same type, and they are used to
identify the energy terms. However each of the atoms must have a unique name within
the monomer, for example C1, C2,…,C6 in Benzene. In the monomer CO the unique
particle identifiers are C and O while the types of the atoms are CM and OM. The atom
names C and O must match the names in the PDB file.
Let us edit the PDB file now. We want to keep the ALL.MONO as steady as possible and
not follow all the fluctuations in different PDB files. In principle it should be possible to
code into MOIL all the known monomers that appear in the PDB and do a better job in
identifying different monomers. This task is however more painful than may seem. The
PDB is rapidly moving against us and constantly generates exciting new names for
monomers to confuse MOIL and other similar programs. There is no standard for
monomers other than amino acids and nucleic acids, and that is a problem. We know
when to admit defeat and at this point in the space-time continuum we have decided to
leave some monomer adjustments to the users and give MOIL a break.
Go back to the page Processing PDB and click on edit at the right hand side of the line
PDB File. An Editor window appears as shown below that makes it possible to simply
adjust the open file. Use the right hand side bar to scroll down and find the offending
CMO residue/monomer. It is coming immediately after the HEM (which stands for
heme). Change the residue name from “CMO” to “CO” (after removing the “M” do not
forget to add a space after the “O” such that the columns of the rest of line will align up.
The file is formatted). Now the residue name matches the corresponding name in MOIL.
The name of the particles/atoms C and O already matches the names of the particles in
ALL.MONO so no need for a change there, save the file by clicking on Save.
8
Back to Processing PDB. We have made progress, but I would lie to you if I claim that
we are done or even almost done. This is despite the encouraging message we will
receive following our next action. Let us try to process the PDB file again by clicking on
Run Locally at the bottom of the page.
It is with deep sense of satisfaction that I inform you that a little red window shows up
telling us that it is working and then it disappears. Yet another small window follows
which announces something like bla bla bla bla bla… has been successfully generated. A
new fresh blue button is seen below the above text with the tempting-to-click continue.
Do it, press on continue. Everything suddenly disappears and only the moil.tcl menu
remains. Do not worry! Nothing crashes! It is just the way MOIL lets you know that you
have graduated and can now step into a more advanced class. Let us illustrate how we
can now pursue a quick and dirty energy calculation.
Click on Calculate in the top bar of the only window of moil left and select Energy and
then New Config File. The second option of Reload Config File is for cases in which we
had to cut our session short and run to a class or something else. Moil saves changes we
have made in a file and we could reload the session and continue from the point we
stopped once we returned. We will not Reload this time since the tutorial sessions are
minimal. For a sequence of mouse selections I will sometimes use a “/”. E.g. the last case
could have been written Energy/New Config File. The new worksheet of interest is
Energy Information and is shown below. We are still working with default parameters, so
only truly essential information is included. The two Input files on the top of the page
were created by our previous adventure (not shown in the figure). The 2bw9.wcon file is
the connectivity file that contains all the information required for energy calculations
(like bond lengths and bond force constants, etc.). You may view this file (but not edit it)
by clicking on the view button on the right side of the Connectivity File line. You may
also read more on the connectivity file by quickly double-clicking on the Connectivity
File text. A new window will appear with more information on this file (this works for all
named parameters).
9
The second input file Coordinate File includes the coordinates appropriate for energy
evaluation within MOIL. Some atoms (like hydrogens) that are used in the energy
calculations are not present in the PDB files. Our “processing” of the PDB file generates
new coordinates for these atoms and the results are written in “crd” format. The major
output file of energy is below the input files and is called here by a default name
2bw9.wene.
Note that the images of windows were taken from on a Windows machine and therefore
the directory names do not correspond to the names I listed in the text.
To create the energy output file let us run the energy program by clicking the Run Locally
button at the lower end of this page. We receive a message that the program is running
and then appears the blue button to continue. By now you should know what to do with
the continue button. A final message pops when the program finished. The new file that
is open when the calculation is all done is 2bw9.wene and includes the energy listing as
shown below. It looks decent, which may raise our hope. After all when a *.crd and
*.wcon files are generated we can do many of the MOIL calculations in a straightforward
way.
10
It also does not look like it was too hard so far (or was it??). However, the connectivity
file we generate is wrong and we need to revisit the way we generated it. We also would
like to put our protein in a box of water, so there is still considerable body of work to be
done and the next paragraph is a good place to re-start the process. We quit the Energy
worksheet and we are now ready for our next adventure.
Here is another piece of information that we did not think about in our first attempt. In
myoglobin we have a prosthetic group (the heme) that is covalently bonded to the
proximal histidine. It may be bonded to the ligand as well, but this is not what we are
after. We wish to simulate the diffusion of the unbound ligand. The bonding of the heme
challenges the way MOIL generates the covalent structure of the molecule and we need
to help it a little. MOIL generates covalent bonds between monomers that it recognizes as
being part of a polymer. If we give it a sequence of amino acids it will recognize that they
bind sequentially, but how should MOIL recognize that heme binds to one of the
histidine in the amino acid sequence? MOIL is not that smart and we must tell it where
the binding is taking place. The facility of doing that is “addbond”. If we are not using
11
this facility (in the beginning; using default parameters we did not use it) no covalent
bond between the heme and the peptide chain is constructed and the bonding is incorrect.
We start by opening the Process PDB work sheet one more time. Unclick on Use default
parameters to get the full list of variables for your considerations. In the line of
additional binding type in the space available “2bw9.addb” and then click on Edit. MOIL
complains that it cannot find this file. Nevertheless it still does what it is expected to do
and opens a blank edit page. At this point it may help to read the manual or some such.
There are no buttons to click. You may consider visiting the directory moil/moil.doc/gui
and read the file “special” which is quite old. For a more recent file go to the examples
(or tests) directory at moil/moil.examples/ [if your distribution did not come with a
moil.examples folder, you can find one at the download location given at the beginning
of this tutorial]. The directory of interest is moil/moil.examples/myo and the files with
the information we are now after is mbco.addb. Regardless if you view the examples or
not, let me type the short answer below and then explain how it came about.
bond chem HIS 94 NE2 HEM1 156 FE
*EOD
There are only two lines in the addbond file. The first one is the only line that has a
functional value. It declares “bond” between a “chem”ical group (or a monomer) of
“HIS” number 94, atom NE2, and a monomer “HEM1” number 156 and atom name FE.
The second line *EOD is just an indication to the program that the list of bonds is
finished. Obviously to write this line we need to know something about the system at
hand. Well, everyone that reads a beginner biochemistry book should know about
hemoglobin, heme, and the fact that the proximal histidine is attached to the heme. That
much is a requirement before simulating a protein of the globin family using MOIL
(sorry for not letting you know earlier).
What can add a little to the complexity is the numbering of the monomers (above we
have numbers 94 and 156). The numbers of the protein data bank file are NOT what we
want. There is a reason for our processing of the PDB file. Not only do we add more
atoms like hydrogens (that changes the numbering of atoms), but also the numbering of
the residues/monomers within the protein molecule changes.
MOIL has special monomers for the N-terminal and C-terminal called NTER and CTER
respectively. If you do not trust me on this check the crd file which was generated before.
Since we already ran the “Process PDB” procedure, a crd file exists. In the “Processing
PDB” work sheet click on view on the right of the line titled Coordinate File. A new
window opens that displays the coordinates that MOIL generated from the PDB file as
shown below. The format is compatible with the crd format of the program CHARMM.
The first two lines that start with a “*” are comments. The third line has a single integer
which is the total number of atoms in the file (and of the molecule, this number must
match the number of atoms in the connectivity file (*.wcon)). The fourth and the fifth
lines are two hydrogen atoms covalently bonded to the first nitrogen of the polypeptide
chain that are assigned to the monomer NTER (N terminal). They are designated as
12
residue/monomer no. 1. Hence, all the other indices of the monomers that follow are
shifted by one with respect to the PDB indices. Of course it is assumed that no more
adjustments to the residue index will happen later. An example for possibly confusing
and inconsistent numbering is the following: Some PDB files include a ligand (e.g.
carbon monoxide) as a separate residue, others include it as part of the HEME monomer.
Obviously the files with inclusion or exclusion will have a different index for the carbon
monoxide.
In principle a code could be written that relates the PDB index to the index of the crd file.
Such a code will make it possible to use in the addbond file the indices from the PDB.
Unfortunately, this code has not been written yet.
We know that the proximal histidine has an index around 90. We may even identify the
histidine bound to the HEME by looking at the PDB file using ZMOIL graphics. The
bottom line is that without experience and extensive knowledge of addition and
subtraction we cannot pick the residue number from the PDB. It is safer to pick the
numbers directly from the processed files of MOIL avoiding potential errors and the use
of the expertise mentioned above. If we look directly in the PDB, the histidine index is 93
and the HEME index is 154, instead of 94 and 156 which I typed above.
Note also that the HEME is called HEM in the PDB and HEM1 in the crd file. The
reason is that different equilibrium configurations of the heme are possible. The heme’s
nitrogens and iron form a plane if the iron is coordinated to six atoms (bonded also to the
ligand) and the iron is slightly out of the plane of the heme if it is coordinated to five
13
atoms only (no bonded ligand). We are interested in the last case (unbound ligand) that
will allow us to explore ligand diffusion.
So, how did I identify the indices of the relevant binders? One way of doing it is to look
at the connectivity file. The incorrect file we generated without the extra bonding is
indeed incorrect bonding-wise, but it is still true indices-of-monomers-wise. We find out
the histidine which is close to 93 (HIS 94) and the index of the heme group (156). We
realize at this point that the generation of the wrong connectivity file (even without the
extra bonding to the heme) was not a total waste. We are using the *.wcon file to figure
out the right indices for the required addbond file. Of course we cannot exclude the
possibility that two HIS are going to be near each other in sequence and the procedure
above will give a wrong answer for the index of the proximal histidine. A safer approach
is described below.
Another option that we will follow now is to use the ZMOIL graphics to identify the two
monomers. In the “moil.tcl” window click as before on View Structure and select New
Config File. A new window appears. The 2bw9.wcon and 2bw9.crd file names should be
filled already into the spaces near Connectivity File and Coordinate File. ZMOIL
“learns” about progress in other MOIL branches and is using up to date crd and
connectivity files. If the above two file names do not appear in the corresponding file
lines, choose them via the Browse button, or type them in. Note that if you type them in,
you must ensure that the gray button to the right of Coordinate File giving the file type is
correct. For a crd file, the button should read CRD, and so on. Click it if it needs
adjustment.
We are interested in the heme and the residues that are proximate to it. It therefore makes
sense to make a selection. In the same workpage titled Cmoil structure 1 find the line
Pick Display Center. In the box for Cutoff Distance type “4”. Atoms that are 4 angstrom
away from the target (any heme atom – to be selected) will be included in the figure.
Next, click on pick. A new “Pick Lists” window appears as shown below. Use the right
left corner to increase the window to full size if necessary and use the scrolling bar in the
middle to find the monomer/residue HEM1 (the residue number should better be 156).
Highlight HEM1 and then click on insert. Then click OK. The selection window closes
and we return to the Cmoil structure 1 worksheet. Go to the bottom of the page and click
on View (zmoil).
14
Only a subset of atoms shows up. There are two histidines sufficiently close to the heme.
This is not surprising, one of them is the distal histdine and the other the (desired)
proximal histidine. Click on one of the atoms of each of the histidine with the left button
of the mouse while holding the “shift” key (you may press ‘p’ to reset the picked atoms).
One of the residue numbers that you will read at the bottom left corner of the graphic
screen is 65 while the other histidine index is 94. The heme index is 156. Since all these
re-numbering are done with the crd and wcon file, the indices stand true for all MOIL
applications. It is therefore not difficult to make the logical leap and conclude that the
proximal histidine is indeed 94 (the proximal histidine is coming after the distal histidine
in sequence). Further examination of the picture reveals that the proximal histidine is
better oriented to interact with the heme iron and that it is actually closer (you can
measure the distance by choosing an atom from the histidine, then one from the heme,
and pressing ‘d’ to display the distance. Buttons are available for this in the left-hand
panel). In conclusion, 94 and 156 are the indices we are after.
We comment that a similar procedure must be applied to generate S-S bonds (e.g. for a
pair of bonded cysteines). The good news for heme bonding is that MOIL recognizes all
the bonds and there is no need to generate new set of parameters etc. However, it is
possible that you will attempt to study a molecular system that consists of pieces not
known to MOIL. In that case you will get a bitter complaint from MOIL that will kick
you out. You will need to generate new entries to the ALL.MONO and ALL.PROP files.
15
This is illustrated for the molecule BENZENE in a tutorial, but by no means is an easy
task in the general case. If you select Help and then tutorial 1 from the “moil.tcl” window
you will be presented with the benzene example of this feature of MOIL.
So, this was a long story to explain in one line. Are we there yet? Of course not! Here is
the next barrier which is the “Modification File” in the processing pdb work sheet. Close
ZMOIL, and once again open the Processing PDB worksheet if it is not still open (via
Assemble/ProcessPDB). In the Modification File field, give the modification file a name
2bw9.modi
The modification file allows you to fix components in the covalent structure that were
generated automatically by MOIL and you do not like. All angles, torsions, and improper
torsions in MOIL are generated automatically from the bond structure. For example, any
pair of bonds that share one atom defines an angle. This automated procedure is unique
and saves a lot of work if an attempt is made to define all the covalent terms manually.
However, sometimes MOIL makes a mistake and generates undesirable
angles/torsions/improper torsions. So the file of “Modification file” is a way for the user
to fix the bugs that MOIL creates. It may be possible to do a better job internally within
MOIL and to catch all these special cases using computer code. However, for some
esoteric systems the user may want to have the capacity to define the covalent structure
differently. So the file of modification is not only a bug but also a feature. Choose “Edit”:
as before you will get a complaint that the file does not exist, but if you will hang in there
a little longer and click on close, an “Editor” window will open allowing you to type into
the new file. Here is what I typed in (and then clicked on Save)
remo angl chem HEM1 156 NA HEM1 156 FE HEM1 156 NC
remo angl chem HEM1 156 NB HEM1 156 FE HEM1 156 ND
*EOD
The text is almost self explanatory. What we do is to remove two angles from the list of
angles that MOIL keeps for this molecule. The first line reads after translating to English
REMOve ANGLe using CHEMical formulas, the angle we wish to remove consists of
three atoms (all belongs to the residue HEM1). The angle is defined by three sequential
atoms with the iron (FE) in the center and two Heme’s nitrogens, NA and NC, flank it
from behind and at the front.
Why do we want to remove these angles? This a moderately long story so take a deep
breath, (or skip this paragraph). The covalent structure of the heme includes four nitrogen
atoms (NA, NB, NC and ND) that bind to a central iron atom. The geometry is that of a
cross and the deviation from a plane are not large even for an iron with a coordination
number of five. Obviously MOIL generates all the angles of bonds that share the iron
atom. That is, all the angles of the type X-FE-X where X is any atom. For each angle
MOIL assigns an energy term of the form: U(Θ)=k(Θ-Θ0)2 where k is the force constant,
Θ is the angle between the two bonds, and Θ0 is the equilibrium value. The equilibrium
value and the force constant are estimated from a set of small molecule values. However,
NA-FE-NC is (or almost is) linear. This is an unusual angle for organic-like compounds
16
(which proteins are) and is not handled very well by the default energy formulation of
angles that was mentioned above. In fact -dU(Θ)/dr (the forces) are singular for Θ near
zero (dividing by zero).
These angles are also not necessary to reproduce reasonable heme geometry. The
approximately right angles (e.g. NA-FE-NB) are doing this job. We therefore remove the
close-to-linear angles that can cause a lot of numerical troubles. This is what the
Modification File does.
Are we done yet?
Nope.
The last item on the agenda of computing the connectivity file is the generation of a water
box. Let us generate a box of 60 angstrom around the myoglobin molecule. Go to the
bottom of the processing pdb page and type “60” in each of the empty text boxes near
XWBX YWBX and ZWBX, which mean the length of the X/Y/Z edge of the Water
BoX, then click on Run Locally. You will see more red windows flashing on and off
(there are couple of programs participating in the generation of the new system of
myoglobin + water. The fact that the windows do not stop and complain is actually
reassuring). At the end you should see a window “Connectivity database bla bla bla …
has been successfully generated” Click on Continue.
All the windows (with the exclusion of moil.tcl) disappeared. This means that we are all
set (again) to go to level 2. Let us first look at the monster we just generated by going
again to the View Structure subprogram. We just need to select New Config File. A new
page with the names of the necessary files will show up. Click on Secondary Structure in
the Display Mode to make the picture a little “brighter” and go to the bottom of the page
(yes, once more) and click on View (zmoil). You will see a beautiful water box and a
ribbon structure of myoglobin as shown below (if you do not, ensure that the option
“Display Water” is enabled in the Display Options portion of the user interface). You can
modify the picture to space filling model to make sure that the protein is reasonably well
covered with (red) water molecules. If you do, you will notice that there are a few white
(protein) spots visible through the layers of water molecules, but we will let it go this
time, rather than re-running with even bigger box of water.
17
Let us now make a short molecular dynamics trajectory. Close the Cmoil Structure 1
window and go back to the moil.tcl main window. Click on Calculate and select
Dynamics and New Config File. A new window appears (what else?!) with a lot of
parameters as shown below. At the moment we will change nothing. A few of the default
values are already printed in. For example #STE is the number of integration steps and it
is currently set to 100.
18
A simple way of getting more information about these parameters is to place the mouse
on top of their text (e.g. above #crd), (the mouse pointer will take a form of a hand), and
then double click. A new window will show up with a short text explaining what this
parameter is all about. For example double-clicking on #crd produces a new window with
a short text as shown below. In fact the default value for dyna quoted there is not up to
date. In the dynamics program dyna it is 1.
19
Note that the input files in the Dyna Information workpage are the connectivity file
(*.wcon) and the coordinate file (*.crd), The connectivity file which we just generated is
displayed in gray. This means that you can browse the file but you CANNOT edit it.
Editing a connectivity file (unless you are a MOIL wizard) is likely to end up in a
disaster, and is better avoided. Editing a crd file is also not recommended, but can be
helpful sometimes if you wish (for example) to displace the carbon monoxide ligand to
some other binding site. The output files are a *dcd file (which as in CHARMM contains
the sequential coordinates from integration of the equations of motion), and a *dvd file,
which is rarely used in MOIL and includes the velocities. The standard output provides a
text report on the progress of the simulation.
Anyway, let us try to cut it short and click on Run Locally. A new warning window
appears claiming that “symm constraint is essential….”. It is right. I forgot to define a
periodic boundary condition for the simulation, and the program wishes to check if I
really want to simulate a borg ship of water and myoglobin in vacuum, or perhaps add the
symmetry constraint. We can try to add the symmetry constraint “as is”. The program
will use the 60 angstrom size that we used when generating the box to initiated the run.
However this is not wise. Our box is not equilibrated yet which means that we are not
ready yet to use the desired size of the box.
Just to exemplify the problem click on yes (yes we DO want to have a symmetry
constraint of a box of 60 angstrom). Windows will pop up and let you know that the
program started. And in about a minute (depending on the power of your computer)
another window will let you know that the program has finished. The graphical viewer
ZMOIL will open, but just close it for now. You will also see a Viewer showing the text
output from this last dynamics computation. Search for the pattern ENERGIES:. The
information that immediately follows is a list of all the different energy terms. Note the
energy “E evsym” is all stars. The stars are not good. It means that the energy exceeds the
pre-determined format of MOIL. This happens when the energy is very high. In this
particular case the “bad” energy is of the Lennard Jones repulsion between the primary
and image boxes. The bad overlaps between the primary box and the images made the
simulation unstable. It is necessary to go back and to increase the initial size of the box to
20
a size that is more acceptable energetically and slowly (linearly, and at each step)
compress the box to the desired size. Close the viewer.
We go back to the Calculate/Dynamics/New Config File worksheet but this time we go to
the line of “Symmetry Parameters” and we click on the Display on the right side of the
line. A new window will appear with a set of parameters to consider. The lines we need
to touch are the first two: SYMM and SYM2. The SYMM line defines the box size that
we want to start with. The SYM2 line defines the box size that we want to finish with. In
the SYMM line we type “64” for XTRA, YTRA, and ZTRA, and in the SYM2 line
entries XTR2, YTR2, and ZTR2 we type “60”. Then click on close. In the main dyna
window type “100” on the right side of #EQU -- We will need some EQUilibration to
adjust the box size. After setting the parameter #EQU click on Run Locally as we have
done many times by now.
The program will display two windows: One window to let you know that it is starting
and a second that it ended. Once complete, the graphical viewer ZMOIL and the textoutput view will be displayed. We could explore the motion graphically at this point, but
we’d like to setup some special viewing options, and you’d better learn how to bring up
the viewer from scratch, so just close the viewers.
Now we are back with the lone window of moil.tcl. The last accord of this far-too-long or
rambling-on-and-on tutorial is a display of the results of the dynamics. Click on View
Structure and choose New Config File. We wish to look at dynamics so we need to
change the selection of the file. Sometimes the program already inserts the filename and
the Browse is not required. However, you still need to make sure that the extension (the
file type) is correct. Go to right hand side of the Coordinate File line, the file type should
be DYNA (a dynamics file by default ends with dcd). If it is not DYNA, make it so. The
interface recognizes the meaning of different file extensions (dcd -- dynamics, crd –
coordinates, pdb – protein data bank). If you have your own original extension name you
can always choose manually the type of the file from the menu on the right. Needless to
say that having the correct file type defined for your run makes a difference. Ensure
2bw9_dyn.dcd, the dynamics file we just computed, is listed as Coordinate File (or
different if you’ve been choosing your own names!). If not, make it so by browsing or
typing the name. Note: the “swap” button next to the file type should typically be on.
MOIL binary files are written in “big-endian” format, a format once common on large
mainframe computers, and still used by some processors. If you are running on an Intel
or AMD processor, which is likely, we need to “swap” the order of the binary bytes as
they are loaded to read the data correctly.
One of the parameters of the dynamics run was #crd (the frequency of saving coordinate
sets). In our run it was set to one. This means that for every molecular dynamics step we
have a structure that was saved to the disk. Since the total run was of 100 steps we have
100 structures. The program needs to know how many structures to expect from the file
of coordinates that was obtained from dyna (it wants to know how many you would like
to view, up to a maximum of the actual structures recorded to disk – if you are ever
unsure, just enter a large number). Find the line Number of Structures and write in the
21
box on the right “100”. As before we will click on Secondary Structure button, which can
be found in the line Display Mode. We also wish to view more clearly the ligand (carbon
monoxide). Therefore we will draw it using a space filling model. We move to the line
Pick for Spacefilling. Click on the Pick on the right. Scroll the middle bar and highlight
the residue CMO (or CO if you changed it) (it is number 157 on the list). Click on insert
and then OK. We are back in the Cmoil structure 1 worksheet. Click on the bottom of the
page on View (zmoil).
If the program does not display a structure go back to the Cmoil structure 1 worksheet
and to the line Coordinate File. Check the box near swap. There are two ways of
generating unformatted files with and without byte swapping and apparently the way the
programs are set now in my computer, this box should have been checked. Press
View(zmoil) again to have a look.
To enjoy the dynamics (it is short, only 100 femtoseocnds, so do not expect anything
dramatic to happen) go to the left to the sub-window called “Structure Animation
Controls”. You may wish to click on movie mode or even more entertaining go to manual
and use the diamond just below to the title. Drag the diamond with the mouse at the rate
of your convenience to display the structures sequentially. Note that 100 femtoseconds
are enough to see some ligand rotation (but not spatial diffusion).
Ok. One more exercise due to popular demand:
More water, please (or advanced swimming class)
We now prepare a simulation of protein in water with long-range summation of
electrostatic interactions (Ewald sum) and a series of other goodies to make our
simulation more professional.
We have done quite a bit of work so far, so why not build on it? Go to
Calculate/Dynamics/New Config File. We will revisit some of the parameters and will
find some new pits to fall into. The good news is that we will solve (eventually) all of the
problems.
Let us first outline a plan. Then, like any other wars the plan will survive only for the first
shot. But it is still good to have a plan. We will have 3 steps: 1. Equilibrate the water, 2.
Equilibrate the water and the protein, 3. A production run.
Step 1: Our initial task is to equilibrate the structure of the water around the protein. The
way we constructed the water box is by simple geometrical considerations. We used a
pre-equilibrated pure water box of 60x60x60 angstroms, place the protein in the center,
and remove all the water molecules with significant overlap with any of the protein
atoms. However, we did not test for correct orientation of the water molecules and the
empirical distance we used to remove the offending solvent molecules is somewhat
arbitrary. For example, it is expected that the water molecules will be closer to charged
residues compare to hydrophobic residues. Our initial procedure is of one-distance-for-all
and there is no differentiation between amino acid types. This non-optimal solution
22
requires proper equilibration of the water molecules around the protein to obtain sensible
room temperature configuration. To start with we trust (completely) the crystallographer
and use the protein “as is”. Only the water will be allowed to move.
OK. Let us be specific and make the concrete change on the work sheet. We start with
Basic Parameters. First #ste 100 steps is a bit low for serious equilibration work. Even the
shortest relaxation times of liquid configurations are in the picoseconds and not in the
femtoseconds. We still want to keep it reasonably short so make it #ste 5000. The #equ is
the number of equilibration steps. Since we are now doing ONLY equilibration and no
data collection, the number of equilibration steps should be made equal to the number of
steps. Make it #equ 5000. The number of steps between printout of information (INFO)
should be set to 10. When we will be ready to launch a looooooong simulation then 10 is
likely to be too much. However, in the beginning at least, it is a good practice to print
frequently. It will be helpful to catch those bugs early in the day before it is getting dark
and messy in complex and advanced simulations and the bugs become more elusive.
Make it Info 10. The next item on the agenda is #crd which is the frequency that we save
coordinate-sets to the disk (to the file 2bw9_dyn.dcd). Let it be #crd 1000 (we do not
need many structures here during the equilibration, just the last structure to continue the
process. The variable #vel is the frequency in which velocities are written (to the file
2Bw9_dyn.dvd). Velocities are of no use for us at the moment and we therefore will not
change the default number (which should be zero – No velocities are written).
The variable #lis is the frequency in which the non-bonded interactions are updated.
Since we will use Ewald (and short real space cutoff distance) we should update quite
frequently, which means in this case -- #lis 10. The number should be bound between 5
and 20. Less than 5 does not make sense, since the update of the lists is computationally
expensive and in the less than 5 steps the system hardly moves at all. More than 20 steps
is seeking trouble. In 20 steps the system can change enough so that abrupt nonconserving energies will be observed. We will not touch most of the upcoming variables,
so from now on I will discuss only variables that we modify.
The variable RVMX is the Van der Waals distance cutoff and we can safely set it to 8.0
instead of the default which is now at 9.0 (angstrom). The other distance cutoff that we
touch is RELX (cutoff for real space electrostatic interactions) which instead of 12.0 we
set to 9.0. An important thing to remember about van der Waals and electrostatic cutoff is
that MOIL requires the electrostatic cutoff to be larger than the van der Waals cutoff. We
cannot have (for example) the RVMX equal 12 while RELX equal 9. It does not make
sense from physical viewpoint since the van der Waals interactions decay like r −6 while
the electrostatic interactions decay much more slowly - r −1 . Regardless of whether it
makes sense or not, MOIL is not checking for correctness in this case. However, it is
using a few speed-up tricks relying on the distance hierarchy I just mentioned. It
produces wrong results if you set the wrong hierarchy! So be warned, be careful, and be
right.
We also want to check the box near the variable HVDW; it means that the hydrogen
atoms have a small (non-zero) van der Waals radius which prevent them from collapsing
23
on top of other (negative) atoms. Adding (small) hard cores to the hydrogen atoms adds
to the stability of the calculations and prevents columbic “explosion” if opposite charges
occupy the same position in space (something which is allowed if the van der Waals
radius of the atom is zero. Zero van der waals for hydrogen atom is used in OPLS, the
force field that we employ. There are also quite a few variables that look something like
NOxx NOxx NOxx that reminds me of the way I talk to my son. Seriously, these
parameters allow us to turn off different energy options. For example checking the box
near NOEL will turn off the electrostatic interactions. These options are mostly
convenient for debugging.
We leave all the NOxx boxes unchecked and move on to the Symmetry Parameters. Click
on Display on the right. In the newly opened window titled Symmetry Parameters for
Dynamics we need to work on several boxes. First we simulate water equilibration (as
before). For convenience some of the explanation are repeated. The box that we cut to
begin with (60x60x60 angstrom) is likely to have hard collisions at the box boundary. We
therefore start the calculation with a somewhat larger box and equilibrate it to the desired
box size. The first line of the new window requires filling the boxes for XTRA YTRA and
ZTRA. These are the parameters for the box sizes we begin with. The starting box should
be larger by several angstroms compared to the desired box size, so we use 64 for all
three. In the line below with the entries XTR2 YTR2 and ZTR2 we put the final desired
size of the box (60) three times. This is exactly the same as we did before. And here
comes the exciting new part. To allow for Ewald summation check the box near EWAL
and make DTOL 0.0001 (allowed error parameters). It is also good to set GRDX GRDY
GRDZ (the grid sizes) to 32 each. This is it. We are done with the Symmetry part. Close
the Symmetry window. We are back at the Dyna Information sheet continuing to our next
adventure at Constraint Parameters. Click on display at the right side of Constraint
Parameters.
There are two types of constraints that we wish to turn on. The first requires a click near
MSHK (the line of Matrix SHaKe). It makes the water molecule TIP3 rigid. Since TIP3
was designed as rigid (by W. Jorgensen), it makes sense to keep it so. The second
constraint that we wish to use is of NFRoZen. It picks up the particles that will NOT be
frozen in the upcoming simulation. In our case we should select (using the Pick command
on the right) the TIP3 molecules. The rest of the particles (i.e. the proteins) will be frozen
in their initial configuration allowing relaxation of the water molecules (only). Click on
Edit on the right, in the new window that appears scroll the bar in the middle until the
name TIP3 shows up in the list of monomer names. Click on one of the TIP3 (does not
matter which) and then click on Insert in the center. In the selection line below we will
see the line “chem mono TIP3”. Click on OK. You have returned back to the Constraint
Parameters window and the selection has been transferred to the right of NFRoZen. Click
on Close.
We are back at the Dyna Information window. Click on Run Locally. You will receive a
message that the program is starting and immediately an error message that states “Failed
to run… “. This should not really surprise us given our past experience. Let us see if we
can figure out what the problem is. Click on View & Save. A new window shows up
24
titled “Saved ErrMsg File”. Use the right side bar to scroll down to the end of the file.
There you will find the message “The charge of the system is 1.000000…”. Followed by
a RED ALERT and a suggestion Add counterion and try again. Ouch, here is something
we indeed need to fix. The Ewald summation of electrostatic energies is doing so to
infinite distance, duplicating by symmetry the basic water box. This is an approximation
since in real solutions of water and proteins the symmetry is obviously lower.
Nevertheless, it is one of the better approximations we have at present. However, if the
box is charged then the summation over boxes created by the symmetry operation will
make the system macroscopically charged. Clearly an incorrect modeling, we require that
the system be neutral.
The protein at hand has a net charge of +1 electronic charge. What we need to do at
present is to add a counter ion and make sure that the box is electrically neutral. Where
should we put the ion? The solution will be equilibrated as discussed above (with an ion
or not), so there is no need for high accuracy in placing the ion (unless there are specific
binding properties for the ion that require more detailed modeling). Since no positively
charged ion was seen in crystallography, we assume it is a mobile ion and can be placed
anywhere in the box. The way we will do it is by replacing one of the many water
molecules by a chloride ion. We need to make changes in two files: in the coordinate file
(*crd) and in the connectivity file (*wcon) file.
Lucky for us moil is reasonably well equipped to deal with ions and the required
modification of coordinate files. We return to the main menu of moil and select
Assemble/Build Connectivity/Reload Config File . Assuming that you are in the same
directory that we worked in before you can identify a file 2BW9_con.in, highlight it with
the mouse and select OK. A Connectivity File information worksheet is now open. Since
we reloaded the old file a lot of entries were refilled already saving us time and
aggravation. At the third file line (the line that says Polymer File) there is a button on the
far right that says Add Ions. It looks exactly what we need - Click. We’ll replace a water
molecule with a choride ion, so check the box near Cl to use this type of ion. You then
need to choose the water molecule that will be replaced. The simplest is to use the side
bar to scroll to the end of the list and pick the number of the last water molecule (should
be number 5753, but this can change when the PDB structure is updated). Note: if you
choose the water molecule first and then choose the Cl ion, you may find that the water
molecule has become deselected in the list; therefore, choose the ion first… Finally, make
sure that the crd file to update is named as our previously generated 2BW9.crd, if it was
not there already, and then click on Save. You will receive a notice that one water
molecule was changed on your behalf. If instead it says no ions were added, go back and
try again, and ensure when you click Save that both a water (number) from the list, and
an ion type is selected, and that a valid filename is present.
Return to the Connectivity File Information (it should have remained open). To improve
the generation of this connectivity file we examine the section Configuration Parameters.
Check the box near MSHK. It signals to the code to keep the distances within a water
molecule fixed (the water molecule is treated like a rigid object). From the perspective of
the connectivity file (*wcon) it means that no internal bonds of water molecules should
25
be included (they are kept rigid by geometric, not energetic constraints) which makes the
later calculations of the dynamics more efficient. Then check also the box near HVDW.
We have checked an identical box also in the Dynamics Information work sheet. It is
actually more important to check it here. The connectivity file is where the information
on the van der Waals radii of all the atoms is stored. It also generates appropriate 1-4
interaction terms that include the HVDW adjustment. After these two small adjustments
we can hit the Run Locally button. The connectivity file should be generated successfully
(and you should receive message to that effect on a separate small window). Click on
continue in the Information window. We now return to the MOIL main menu and we are
ready to try Dynamics again.
A comment: Obviously we would not have had to go through this trial and error series if
we had been better informed. For example, if we knew that the system we prepared has
one more extra charge, we could have skipped the futile attempt at running dynamics
(which we made before) and immediately after Processing PDB continued to generate
new connectivity and coordinate files with the ions inserted. However, (and this is a BIG
“however”), we rarely know the charge of a protein unless we dig into it already (a clever
programmer could have done this as an internal MOIL check, something to think about in
the future…). In this case it may be easier (within MOIL) to let the program find the net
charge for you and then fix the connectivity file and add ions to the coordinate file. So the
path we took up to this point is actually typical.
Select from the main menu Calculate/Dynamics/Reload Config File. We can use the
older file since it was ready to use once we fixed the overall charge of the system. There
is no need for extra practice of your typing and mouse clicks on this page anymore.
Select in the file window 2BW9_dyn.in and click OK. The Dynamic Information page
appears again. Make sure that all the variables that we inserted before are still there (you
can check them by looking back in this document). If not, insert them. There is one
additional item that we need to take care of after adding the ions, which belongs to the
Constraint Parameters. Previously we froze everything with the exception of the water
molecules (i.e. we froze all the protein atoms). Now we have a chloride ion which should
be mobile too. So our selection of which particles NOT to freeze must be modified. Click
on Display on the right hand side of the Constraint Parameters line. A new window
shows up. Go to the NFRoZen line and click on Edit. When the Pick Lists window opens
click on Clear at the bottom. Then drag the middle bar to the end of the file. Highlight the
monomer CL and then click on Insert. Then click on the button at the left to OR and
finally Highlight a TIP3 and click on Insert again. The line of selection should have the
text chem mono CL | chem mono TIP3 . Click on OK. Back in the Constraint Parameters
… window we realize that now both types of monomers CL and TIP3 are not frozen in
the simulations which is what we wanted (you should see this indicated in the NFRoZen:
PICK line) Click on Close. We return to the Dyna Information window.
Are you ready for the big moment? Ready or not click Run Locally. A window shows us
and informs you that the program is running and suggests that you should click on
Continue. Do it and go get a cup of coffee. First, you deserve it. Second it will take some
26
time to execute the 5,000 steps we requested. [on an intel 2Ghz macbook running the
windows version of the software, this computation took about 30 minutes]
While we are waiting I can tell you a secret that all the instructions you made are stored
in a file 2BW9_dyn.in . In general the graphic interface of MOIL saves input files with
the extension *.in It is useful sometimes to have a look to make sure you were
understood correctly and to avoid some unpleasant surprises. If you are running on
Windows (like I am doing now to diversify my CTRL-ALT-DELETE and ALTCOMMAND-ESC abilities) you can watch the progress of the dyna program by pressing
CTRL-ALT-DELETE and click on Processes in the Windows Task Manager window.
One of the processes should be dyna.exe. If all goes well it should consume a significant
fraction of your CPU time. In fact you can use these input files to run command from the
keyboard (e.g. run dyna as “dyna < 2BW9_dyn.in > 2BW9_dyn.out”). Running from
keyboard and text commands are particularly useful when preparing scripts to launch a
large number of jobs, but are beyond the scope of the present tutorial [running programs
from the command line requires that the folder containing the executables are in your execution
path; adding folders to the path is discussed at the top of this tutorial].
Ok, the job is over. Zmoil window just showed up and invited you to free graphic
entertainment. Close that window down, we are here to do some serious stuff and have no
time for further distractions. We need to extract the last structure from the dynamics file
(*dcd) and to make it ready it for yet another Dynamics run in which the protein will be
allowed to move (finally). In the Main window select Analyze/Convert Coordinates/New
Config File. A new window is open titled Convert Coordinates. We need to make a better
choice of the Input Coordinate file. Click on Browse and highlight the file 2BW9_dyn.dcd
then press OK. Under Parameters two numbers are requested. LPST is the index of the
starting structure and LPEN is the index of the final structure. In a *dcd file we are
keeping multiple coordinates sets that were generated during the simulations. The number
of structures is determined by the number of total integration steps (#ste) divided by the
frequency of saving the coordinate set to the disk (#crd). In our case #ste is equal to
5,000 while #crd to 1,000. Hence we expect 5 structures in the 2BW9_dyn.dcd file. We
wish to extract the last structure (the best equilibrated) and to use it to continue the
simulations. Therefore we set both LPST and LPEN to 5. Continuing to the Output
Coordinate File a line below, we need to be a little creative and invent a new name for the
file with equilibrated structure. Let it be 2Bw9_eq.crd which is what we should type in.
The rest can be left “as is”. Click on Run Locally. A happy report states that ccrd
performed WITHOUT red alert.
Step 2: Our goal is to equilibrate the protein structure together with the water. We
equilibrated the water molecules to begin with in Step 1 since the water molecules were
far from correctly equilibrated. We move on to equilibrate the structure of the protein
together with the water. We expect the initial structure to be pretty good (determined by
crystallography and refined with an energy function which is similar in spirit to the one in
MOIL). So this should be a more gentle calculation. We will run a simulation in which
both the protein and the water are allowed to move and we will slowly heat up the protein
from 1K to 300K. Note that there is no need for minimization here. The low temperature
27
run will take care of bad contacts. In most cases if the initial structure is of reasonable
quality minimization is not needed.
We now return to Calculate/Dynamics/Reload Config File. Highlight 2BW9_dyn.in in the
File Open and click OK. The Dyna Information window appears. The selections we have
done previously are there, but we will need to work through some of them and make
changes. The Coordinate File in the second line should be changed to the equilibrated
structure 2BW9_eq.crd. We will make only one change in the Basic Parameters. Go to
the MULtiple Temperature. Make sure that the entry to #TMP is 1. Go one line lower and
insert in the box on the right of TMPI 1, and in the box right to TMPF 300. The initial
temperature will be 1K and the final temperature at the end of 5000 equilibration steps
will be 300K. The temperature will increase linearly in small steps (each step) by velocity
scaling throughout the simulation.
Next we will edit the Symmetry Parameters, go to the right of that line and click on
Display. Change the entry to XTRA YTRA and ZTRA to 60 and erase the entries to
XTR2 YTR2 and ZTR2. The box is relaxed at its correct size and it is now fixed. The rest
of the parameters are left as are (Ewald box is checked, DTOL 0.0001 and GRDX,
GRDY, GRDZ (the grid sizes used by Particle Meshed Ewald for long-range electrostatic
calculations) are set to 32. Click OK.
We move on to the Constraint Parameters and click on Display on the right. A window of
Constraint Parameters showed up. The present simulation is with the protein. We will fix
the length of all the bonds in the protein by checking the box on the right of SHKB in the
first line. We also make sure that MSHK is checked (it should be from the previous
setting of the file). Erase all text from the rest of the worksheet. There is still something
we need to do. If we go ahead and do the simulation as is, the protein will diffuse in the
water box executing overall diffusive motion moving away its center of mass from the
center of the water box. It can continue to the extreme (and many of our simulations are
long enough to see it) in which the protein simply diffuses out of the water box, which is
not good. We therefore simulate the protein with constraints on its center of mass
position and overall orientation of the protein molecule (the last is convenient in
visualization). These constraints should not affect the internal dynamics of the protein
molecule which is what we are interested in.
We will impose the center of mass and orientation constraint with the ORIEnt option.
(Other option of fixing the center of mass of the protein to the center of the water box is
to use the Tether option, adding a spring to the center of mass of the protein. In this
simulation we will go (however) with ORIEnt). ORIEnt will not work if the protein
changes its shape greatly during the calculations (e.g. in protein folding) since an
assumption is made that the fluctuations of its structure compared to the original structure
are small. If the fluctuations are large Tether is a better option than ORIEnt. Click on the
Edit on the right side of ORIent. It is not necessary to select all protein atoms to fix the
protein in the box and the simplest and cheapest solution is to apply this constraint on the
alpha carbons (CA) of the protein. Select (Highlight) any single CA atom that you see on
the right of the new window of Pick Lists. Then click Insert. In the selection line you will
28
now see chem prtc CA which means we have picked any particle (atom) with name
“CA”. Click OK. In the Constraint Parameters window click on close. In the Dynamic
Information window click on Run Locally. This is another run that will take a while. Be
patient (and one cup of coffee should have been enough).
Step 3: Production run. By now the run has finished. The next phase is actually quite
similar to Step 2. We extract the last structure of the equilibration run and use it to
initiate a run of moving protein and water now at 300K that we can analyze in many
ways. This trajectory should be representative to the way the protein fluctuates thermally
in nature (so we hope). Select Analyze/Convert Coordinates/Reload Config File. In the
new File Open window highlight 2BW9_ccr.in and click OK. The window of Convert
Coordinates opens. We can run this file “as is”. Click on Run Locally. We overwrite the
older equilibrated structure (the one in which we allowed only the water to move) but this
is fine since we do not need that file anymore). We get a familiar report about the success
of the ccrd program.
We go back to Calculate/Dynamics/Reload Config File and highlight the file
2BW9_dyn.in in the File Open window. As always click OK. In Dyna Information change
the number of integration steps (#ste) to 20000. We are going to make a more serious run
now in which the total number of steps will be 20,000 and the number of equilibration
steps will be kept at 5,000. We also are going now a little lower and changing TMPI from
1 to 300. The run is now at a constant temperature of 300 and not gradually increasing the
temperature from the cold out of outer space (1K) to the warmth of room temperature
(300K). The equilibration period means re-scaling of the velocity to kinetic energy that
corresponds to 300K. Click on Run Locally. This is a short set of instruction but the run
will take a considerably longer time.
Use zmoil to watch the set of structures stored in the *dcd file. Find the display option
that will allow you to present the carbon monoxide molecule (only) as two space filling
spheres. It is evident that during our short simulation the ligand just bounces a little in the
heme pocket. One of the puzzles in the function of this protein (myoglobin) is the way
the ligand enters (or leaves) the binding site from the solvent. Drawing the protein with
space filling model illustrates that there is no obvious way for the ligand to enter the
binding site and the protein must execute thermal fluctuations to allow for this event to
happen. Experimentally the escape events require tens to hundreds of nanoseconds to
occur (10-7 seconds). The simulation we just run was for picoseconds (10-12 seconds). It is
therefore unlikely that we will succeed to probe these events in straightforward way. This
underlines the need for significant increase in computer power and/or in computational
methods to address long time phenomena in biophysics. Our group is at the front of this
research and computational methods that are unique to moil or were developed under the
umbrella of this program address exactly these problems (calculation of reaction
coordinates with action optimization -- codes: chmin and sdp, the LES methodology –
codes: everywhere, and the Milestoning approach – codes: fp and memeqns). In the next
set of blogs I will illustrate the use of some of these techniques.
29