A tutorial for preparing MOIL simulation from a PDB file A user blog Here is a not-so-quick introduction of how we can pick a protein structure from the PDB (Protein Data Bank), and process it into a useful form for moil calculation. It is assumed that moil is installed. [Most users will have installed pre-built versions of MOIL from http://clsb.ices.utexas.edu/prebuilt/, in which case the following reference to compilation of source code is not necessary]. We will allow a few glitches in installation as described below, but the starting point is the existence of a moil directory and a successful compilation of the source code with one of the make_distribution files appropriate for your computer, (or by manual copy of the relevant exe/binary files to the moil/moil.source/exe directory). If you are not there yet (e.g. the directory moil/moil.source/exe is empty, in Windows it is the directory moil/moil.exe for the moment) go to https://wiki.ices.utexas.edu/clsb/wiki and read the text under the links GetPackagedMoil GetSubversionMoil and BuildingMoil and follow the instructions. We will look into hydrogen building, energy calculations, and construction of a solvation box. We will NOT look on some unique computational tools that were introduced by the MOIL team such as LES or reaction path calculations. These blogs may or may not come later. Ok. Here we go. Below I type between “…” commands that are typed with the keyboard and/or into specific text windows. Mouse clicks will be described with italics. Computer text is either with “…” or underline. We start by selecting a protein structure form the PDB. I am addicted to Firefox, but similar procedure is expected to apply to Explorer/Safari or whatever is your favorite browser. Type in the url of the PDB “http://www.rcsb.org”. Once the web site is loaded, type in the search box (select PDB ID or keyword) “mbco”. This search is followed by a list of possible hits. Among the many myoglobin structures found, the entry 2bw9 we shall use. There are several icons near the 2BW9 text. Click on the icon to download the file 2bw9.pdb and move the file to the directory you wish to use for this project (on my home Mac it is the directory “~/work/get_started”). The first thing we want to try is to visually inspect the structure. We will use zmoil for that purpose. Note: the following description instructs you on the use of the MOIL graphical user interface as a means to launch the molecular-viewer application Zmoil. However, Zmoil may also be launched independently for viewing standard file formats such as PDB. See the Zmoil documentation for more information. 1 [Windows users may launch MOIL from the Desktop shortcut and/or the Start Menu after having installed MOIL. In the following discussion which refers to running moil from the command line, you may simply run MOIL as usual from your icon, and when opening files within MOIL, simply browse to the folder containing the desired file.] We go to the directory with the PDB file “ cd ~/work/get_started”, then type “moil.tcl” (moil requires tcl/tk installed on your computer, unless you are using the Windows version of moil). It is assumed that you added to your path the directory ~/moil/moil.gui . If not you will get the message moil.tcl not found (just what happened to me now). So being lazy I try the most straightforward approach and type the whole path ~/moil/moil.gui/moil.tcl (for future reference you probably should edit your .cshrc or your equivalent default shell startup file to add ~/moil/moil.gui to your path). [Windows users may also add the moil.gui folder to their path if they desire to run MOIL from the command line. Windows users should simply type “moil” rather than “moil.tcl” to launch MOIL from the command line.] Success!! The moil.tcl window appeared (shown below) with top bar with a few options to choose from and so-and-so picture of the gramicidin ion channel. At the bottom of the moil window there are links to our site and e-mail. You can use these links to read more on the research conducted with moil and on its development. You can also e-mail us about bugs which we will happily hunt and respond as quickly as possible. Note however that this is free code and we are not paid to guide potential users through the application code. You may wish to contact Ron Elber [email protected] if you need extensive support. 2 Ok. Let us first look at the PDB structure that we just downloaded. Click on View Structure and select New Config File. A new page appears with a lot of text boxes to fill (shown below). Do not forget your credit card number and billing address (just kidding…). You can click at the right lower corner of the page and drag the mouse to enlarge (or decrease) its size. The default size is rarely optimal. To start we will make only minimal selections. Erase the text “.wcon” from the entry on the top (the “Connectivity File”). Go to the line below (marked Coordinate File) and click on the right side of the same line on Browse. Drag the window with the mouse clicked at the lower right corner if some of the text is hidden. Highlight (click on with the mouse) the file 2bw9.pdb and then click ok. Choose how the protein will be drawn by selecting options next to the line Display Mode: choose Stick, and to add some flavor to the picture click also on the button on the left of Secondary Structure. 3 Use the scroll bar at the right side of the window to scroll to the end of the page. There are couple of action buttons below. DO NOT click on Quit, and try instead clicking on View (zmoil). The dialog that first appears gives a crash course on the function of the mouse in zmoil. It is useful to know that zmoil works best with a mouse with at least three buttons. The left is to rotate the molecule, the right is to translate it, and the middle to zoom. The middle and right buttons can also used to scroll the UI panels, such as the long panel on the left: moving your mouse over the panel and either scrolling with the middle mouse wheel or right-click dragging will allow you to expose additional UI. On the MAC mouse (one button only) a click is a left click, and a right click is CTRL-click. You may select not to see this message ever again by clicking on the left of the text Do not show this message again. Otherwise just click ok at the bottom of this message and move on to the next paragraph. By now you should have a picture of the screen as above (if not, here is your opportunity to write your first bug report!). Note the extensive clickable options on your left. I will let you explore the visualization options on your own. Some of the more advanced features include printing an image, measuring a torsion angle and beautifying the picture. If you scroll down the instruction windows on the left (via mouse-wheel or right-click dragging) you will discover a submenu called Surface Editor. On the right side of the surface editor you will find a button Display surface. Click on it. It will take a few seconds to compute the surface and then it will be displayed as covering half of the molecule. Rotate the molecule with the left mouse button to have a better appreciation for the display surface. 4 Close zmoil (by pressing ESC or clicking the window-close icon) when you are satisfied. Let us now try to process the PDB file and prepare the necessary input file for simulation. Go to the upper left corner of the main moil window and click on assemble and select process PDB. Another worksheet shows up (as shown below). Note that the lower left part of the new page there exists a clickable spot Use default Parameters. The intention is to reduce complexity when it is not necessary. We will first use the simpler procedure, however, we will need to use some extra parameters (and complexity) later (the image below is produced when you click the Default Parameters button): The first active line of the new page starts on the left with PDB file. Click on the right side of the same line on Browse. A file selection window appears. Select the PDB file of interest by highlighting it with the mouse 2BW9.pdb (or whatever slight variation your file may be named) and then click on OK. The file selection window disappears, and new text appears in various entries in the Process PDB worksheet. After you pick your PDB 5 file, necessary files are assigned default names. For example, your molecule is now named 2BW9. You can change those file names if you wish but you do not have to, and it is a good practice to leave them as is for easy tracking in the future. Go now to the bottom of the page and click on Run Locally. If you expect to receive a message of the type: “All is well, continue to next level” you are a true optimist. Unfortunately working with MOIL requires you to be a realist. As a rule of a thumb the conversion from a PDB to MOIL internal format and the generation of the appropriate file for energy calculation never works the first time, and the present case is not different (sometimes it does not work even the fourth time, but as long as progress is made do not despair). The messages we receive this time are divided between two windows. The more serious problem is also the more threatening picture (with the hand signaling you to stop). The message is “CMO” not in /Users/ron/moil/moil.gui/defaults/ALL.MONO. A brief explanation on the way MOIL interprets PDB files is now in place if we are to overcome this particular barrier. This tutorial was written in September 2008. Since that time, CMO has been added to the MOIL monomer database, so the “CMO” problem described will not occur. The following explanation is left here for reference, since similar problems are likely to be encountered by MOIL users. Since CMO is in the MOIL monomer database, you will not need to edit the PDB file as directed below. MOIL reads the residues (or “monomers” in MOIL’s lexicon) from the PDB file and tries to match these monomers with monomers that already exist in the MOIL database. The latter are fully parameterized and characterized. The database of MOIL is quite extensive and includes (for example) all the amino acids, water models, models of nucleotides, ions, and a few small molecules that serve as ligands. You may wish to have a peek at this file by clicking on edit in the line starting with Monomer File in the Processing PDB working sheet (DO NOT change it). However, it is obviously not complete. The program relies on the monomer information to know which atom is connected to which, how strongly, etc. If no such match is made, there is a problem. It is not possible to define an energy function for a group of unidentified atoms. In the PDB file at hand, there exists the monomer CMO. Do you know what it is? Neither did I! And of course MOIL is clueless as well. MOIL is trying its best and the message about CMO is (translated to English) – “Sorry, I cannot continue, however, if you insist the best I can do is to remove this offending residue and work with the rest”. You may answer “yes”, then the program will remove the CMO and you can forget about it ever after. However, is this what you want? Perhaps CMO is where the most interesting action takes place. Actually the mysterious CMO is important. It stands for Carbon MonOxide, which is the ligand we wish to follow and investigate its diffusion. So we better keep it. It is time to 6 stop following the automaton and to use superior human intelligence. Click on Stop. Not surprisingly the next message is Could not process PDB file… Click close. We now have two windows (a) *_pdb.log (the * is for wild card) and (b) Processing PDB. We do not need (a) to continue but it is a good point to explain what it is all about. In the *.log we see many lines that look like Dropped: ATOM 239 CD1BTRP M 29 37.474 12.072 15.406 0.50 30.77 This is a copy of a line from the PDB of an atom CD1 from the residue TRP. The atom CD1 got an extension to CD1B since the side chain has multiple conformations and in the large number of molecules in the crystal (Avogadro or some such) some are found in the crystal at alternate conformations. The multiple conformations add up to a more complex spectrum that can still be interpreted by having TRP in more than one conformation. Obviously a single molecular model will have only one side chain for a particular amino acid (actually the LES code of MOIL allows you to retain more than copy of a side chain, but this is another story). At present MOIL makes the arbitrary choice and keeps only the first conformation listed in the file. The atom of TRP that will be kept is CD1A while CD1B etc will be dropped. You may have a different opinion which atoms are needed. If you do, you will need to edit the PDB file yourself and keep the conformation close to your heart. At present MOIL does not allow for choices of side chain conformations through the graphic interface. In the present exercise we will let MOIL choose and therefore the output of *.log is not alarming. We need to do something about CMO. Lucky for us there is a simple solution. Carbon monoxide “monomer” already exists in ALL.MONO. However, it is not called CMO but CO instead. If we slightly edit the PDB file we should be all set. Let us look first at the definition of CO in moil. Click at the edit at the right side of the Monomer File line in the Processing PDB page (note that if you double click on Monomer File a new window will appear with a short explanation what the monomer file is all about). A new page opened with three internal windows titled Monomers, Particles, and Bonds as shown below. If you scroll the sidebar of Monomers you will find CO close to the bottom. Click on CO. In the other windows you will “discover” that CO consists of two particles named C and O (the particle types are CM and OM) and the C is bonded to the O. You will also note CMO, newly added since this was originally written, lives just next to CO in the Monomer list. If you click on it, you will see it is identically defined as CO. 7 Particle types are used to define the physical and chemical properties of an atom (or particle). For example the carbon and the hydrogen attached to it in a benzene ring are one atom type (say CH). All atoms in Benzene are of the same type, and they are used to identify the energy terms. However each of the atoms must have a unique name within the monomer, for example C1, C2,…,C6 in Benzene. In the monomer CO the unique particle identifiers are C and O while the types of the atoms are CM and OM. The atom names C and O must match the names in the PDB file. Let us edit the PDB file now. We want to keep the ALL.MONO as steady as possible and not follow all the fluctuations in different PDB files. In principle it should be possible to code into MOIL all the known monomers that appear in the PDB and do a better job in identifying different monomers. This task is however more painful than may seem. The PDB is rapidly moving against us and constantly generates exciting new names for monomers to confuse MOIL and other similar programs. There is no standard for monomers other than amino acids and nucleic acids, and that is a problem. We know when to admit defeat and at this point in the space-time continuum we have decided to leave some monomer adjustments to the users and give MOIL a break. Go back to the page Processing PDB and click on edit at the right hand side of the line PDB File. An Editor window appears as shown below that makes it possible to simply adjust the open file. Use the right hand side bar to scroll down and find the offending CMO residue/monomer. It is coming immediately after the HEM (which stands for heme). Change the residue name from “CMO” to “CO” (after removing the “M” do not forget to add a space after the “O” such that the columns of the rest of line will align up. The file is formatted). Now the residue name matches the corresponding name in MOIL. The name of the particles/atoms C and O already matches the names of the particles in ALL.MONO so no need for a change there, save the file by clicking on Save. 8 Back to Processing PDB. We have made progress, but I would lie to you if I claim that we are done or even almost done. This is despite the encouraging message we will receive following our next action. Let us try to process the PDB file again by clicking on Run Locally at the bottom of the page. It is with deep sense of satisfaction that I inform you that a little red window shows up telling us that it is working and then it disappears. Yet another small window follows which announces something like bla bla bla bla bla… has been successfully generated. A new fresh blue button is seen below the above text with the tempting-to-click continue. Do it, press on continue. Everything suddenly disappears and only the moil.tcl menu remains. Do not worry! Nothing crashes! It is just the way MOIL lets you know that you have graduated and can now step into a more advanced class. Let us illustrate how we can now pursue a quick and dirty energy calculation. Click on Calculate in the top bar of the only window of moil left and select Energy and then New Config File. The second option of Reload Config File is for cases in which we had to cut our session short and run to a class or something else. Moil saves changes we have made in a file and we could reload the session and continue from the point we stopped once we returned. We will not Reload this time since the tutorial sessions are minimal. For a sequence of mouse selections I will sometimes use a “/”. E.g. the last case could have been written Energy/New Config File. The new worksheet of interest is Energy Information and is shown below. We are still working with default parameters, so only truly essential information is included. The two Input files on the top of the page were created by our previous adventure (not shown in the figure). The 2bw9.wcon file is the connectivity file that contains all the information required for energy calculations (like bond lengths and bond force constants, etc.). You may view this file (but not edit it) by clicking on the view button on the right side of the Connectivity File line. You may also read more on the connectivity file by quickly double-clicking on the Connectivity File text. A new window will appear with more information on this file (this works for all named parameters). 9 The second input file Coordinate File includes the coordinates appropriate for energy evaluation within MOIL. Some atoms (like hydrogens) that are used in the energy calculations are not present in the PDB files. Our “processing” of the PDB file generates new coordinates for these atoms and the results are written in “crd” format. The major output file of energy is below the input files and is called here by a default name 2bw9.wene. Note that the images of windows were taken from on a Windows machine and therefore the directory names do not correspond to the names I listed in the text. To create the energy output file let us run the energy program by clicking the Run Locally button at the lower end of this page. We receive a message that the program is running and then appears the blue button to continue. By now you should know what to do with the continue button. A final message pops when the program finished. The new file that is open when the calculation is all done is 2bw9.wene and includes the energy listing as shown below. It looks decent, which may raise our hope. After all when a *.crd and *.wcon files are generated we can do many of the MOIL calculations in a straightforward way. 10 It also does not look like it was too hard so far (or was it??). However, the connectivity file we generate is wrong and we need to revisit the way we generated it. We also would like to put our protein in a box of water, so there is still considerable body of work to be done and the next paragraph is a good place to re-start the process. We quit the Energy worksheet and we are now ready for our next adventure. Here is another piece of information that we did not think about in our first attempt. In myoglobin we have a prosthetic group (the heme) that is covalently bonded to the proximal histidine. It may be bonded to the ligand as well, but this is not what we are after. We wish to simulate the diffusion of the unbound ligand. The bonding of the heme challenges the way MOIL generates the covalent structure of the molecule and we need to help it a little. MOIL generates covalent bonds between monomers that it recognizes as being part of a polymer. If we give it a sequence of amino acids it will recognize that they bind sequentially, but how should MOIL recognize that heme binds to one of the histidine in the amino acid sequence? MOIL is not that smart and we must tell it where the binding is taking place. The facility of doing that is “addbond”. If we are not using 11 this facility (in the beginning; using default parameters we did not use it) no covalent bond between the heme and the peptide chain is constructed and the bonding is incorrect. We start by opening the Process PDB work sheet one more time. Unclick on Use default parameters to get the full list of variables for your considerations. In the line of additional binding type in the space available “2bw9.addb” and then click on Edit. MOIL complains that it cannot find this file. Nevertheless it still does what it is expected to do and opens a blank edit page. At this point it may help to read the manual or some such. There are no buttons to click. You may consider visiting the directory moil/moil.doc/gui and read the file “special” which is quite old. For a more recent file go to the examples (or tests) directory at moil/moil.examples/ [if your distribution did not come with a moil.examples folder, you can find one at the download location given at the beginning of this tutorial]. The directory of interest is moil/moil.examples/myo and the files with the information we are now after is mbco.addb. Regardless if you view the examples or not, let me type the short answer below and then explain how it came about. bond chem HIS 94 NE2 HEM1 156 FE *EOD There are only two lines in the addbond file. The first one is the only line that has a functional value. It declares “bond” between a “chem”ical group (or a monomer) of “HIS” number 94, atom NE2, and a monomer “HEM1” number 156 and atom name FE. The second line *EOD is just an indication to the program that the list of bonds is finished. Obviously to write this line we need to know something about the system at hand. Well, everyone that reads a beginner biochemistry book should know about hemoglobin, heme, and the fact that the proximal histidine is attached to the heme. That much is a requirement before simulating a protein of the globin family using MOIL (sorry for not letting you know earlier). What can add a little to the complexity is the numbering of the monomers (above we have numbers 94 and 156). The numbers of the protein data bank file are NOT what we want. There is a reason for our processing of the PDB file. Not only do we add more atoms like hydrogens (that changes the numbering of atoms), but also the numbering of the residues/monomers within the protein molecule changes. MOIL has special monomers for the N-terminal and C-terminal called NTER and CTER respectively. If you do not trust me on this check the crd file which was generated before. Since we already ran the “Process PDB” procedure, a crd file exists. In the “Processing PDB” work sheet click on view on the right of the line titled Coordinate File. A new window opens that displays the coordinates that MOIL generated from the PDB file as shown below. The format is compatible with the crd format of the program CHARMM. The first two lines that start with a “*” are comments. The third line has a single integer which is the total number of atoms in the file (and of the molecule, this number must match the number of atoms in the connectivity file (*.wcon)). The fourth and the fifth lines are two hydrogen atoms covalently bonded to the first nitrogen of the polypeptide chain that are assigned to the monomer NTER (N terminal). They are designated as 12 residue/monomer no. 1. Hence, all the other indices of the monomers that follow are shifted by one with respect to the PDB indices. Of course it is assumed that no more adjustments to the residue index will happen later. An example for possibly confusing and inconsistent numbering is the following: Some PDB files include a ligand (e.g. carbon monoxide) as a separate residue, others include it as part of the HEME monomer. Obviously the files with inclusion or exclusion will have a different index for the carbon monoxide. In principle a code could be written that relates the PDB index to the index of the crd file. Such a code will make it possible to use in the addbond file the indices from the PDB. Unfortunately, this code has not been written yet. We know that the proximal histidine has an index around 90. We may even identify the histidine bound to the HEME by looking at the PDB file using ZMOIL graphics. The bottom line is that without experience and extensive knowledge of addition and subtraction we cannot pick the residue number from the PDB. It is safer to pick the numbers directly from the processed files of MOIL avoiding potential errors and the use of the expertise mentioned above. If we look directly in the PDB, the histidine index is 93 and the HEME index is 154, instead of 94 and 156 which I typed above. Note also that the HEME is called HEM in the PDB and HEM1 in the crd file. The reason is that different equilibrium configurations of the heme are possible. The heme’s nitrogens and iron form a plane if the iron is coordinated to six atoms (bonded also to the ligand) and the iron is slightly out of the plane of the heme if it is coordinated to five 13 atoms only (no bonded ligand). We are interested in the last case (unbound ligand) that will allow us to explore ligand diffusion. So, how did I identify the indices of the relevant binders? One way of doing it is to look at the connectivity file. The incorrect file we generated without the extra bonding is indeed incorrect bonding-wise, but it is still true indices-of-monomers-wise. We find out the histidine which is close to 93 (HIS 94) and the index of the heme group (156). We realize at this point that the generation of the wrong connectivity file (even without the extra bonding to the heme) was not a total waste. We are using the *.wcon file to figure out the right indices for the required addbond file. Of course we cannot exclude the possibility that two HIS are going to be near each other in sequence and the procedure above will give a wrong answer for the index of the proximal histidine. A safer approach is described below. Another option that we will follow now is to use the ZMOIL graphics to identify the two monomers. In the “moil.tcl” window click as before on View Structure and select New Config File. A new window appears. The 2bw9.wcon and 2bw9.crd file names should be filled already into the spaces near Connectivity File and Coordinate File. ZMOIL “learns” about progress in other MOIL branches and is using up to date crd and connectivity files. If the above two file names do not appear in the corresponding file lines, choose them via the Browse button, or type them in. Note that if you type them in, you must ensure that the gray button to the right of Coordinate File giving the file type is correct. For a crd file, the button should read CRD, and so on. Click it if it needs adjustment. We are interested in the heme and the residues that are proximate to it. It therefore makes sense to make a selection. In the same workpage titled Cmoil structure 1 find the line Pick Display Center. In the box for Cutoff Distance type “4”. Atoms that are 4 angstrom away from the target (any heme atom – to be selected) will be included in the figure. Next, click on pick. A new “Pick Lists” window appears as shown below. Use the right left corner to increase the window to full size if necessary and use the scrolling bar in the middle to find the monomer/residue HEM1 (the residue number should better be 156). Highlight HEM1 and then click on insert. Then click OK. The selection window closes and we return to the Cmoil structure 1 worksheet. Go to the bottom of the page and click on View (zmoil). 14 Only a subset of atoms shows up. There are two histidines sufficiently close to the heme. This is not surprising, one of them is the distal histdine and the other the (desired) proximal histidine. Click on one of the atoms of each of the histidine with the left button of the mouse while holding the “shift” key (you may press ‘p’ to reset the picked atoms). One of the residue numbers that you will read at the bottom left corner of the graphic screen is 65 while the other histidine index is 94. The heme index is 156. Since all these re-numbering are done with the crd and wcon file, the indices stand true for all MOIL applications. It is therefore not difficult to make the logical leap and conclude that the proximal histidine is indeed 94 (the proximal histidine is coming after the distal histidine in sequence). Further examination of the picture reveals that the proximal histidine is better oriented to interact with the heme iron and that it is actually closer (you can measure the distance by choosing an atom from the histidine, then one from the heme, and pressing ‘d’ to display the distance. Buttons are available for this in the left-hand panel). In conclusion, 94 and 156 are the indices we are after. We comment that a similar procedure must be applied to generate S-S bonds (e.g. for a pair of bonded cysteines). The good news for heme bonding is that MOIL recognizes all the bonds and there is no need to generate new set of parameters etc. However, it is possible that you will attempt to study a molecular system that consists of pieces not known to MOIL. In that case you will get a bitter complaint from MOIL that will kick you out. You will need to generate new entries to the ALL.MONO and ALL.PROP files. 15 This is illustrated for the molecule BENZENE in a tutorial, but by no means is an easy task in the general case. If you select Help and then tutorial 1 from the “moil.tcl” window you will be presented with the benzene example of this feature of MOIL. So, this was a long story to explain in one line. Are we there yet? Of course not! Here is the next barrier which is the “Modification File” in the processing pdb work sheet. Close ZMOIL, and once again open the Processing PDB worksheet if it is not still open (via Assemble/ProcessPDB). In the Modification File field, give the modification file a name 2bw9.modi The modification file allows you to fix components in the covalent structure that were generated automatically by MOIL and you do not like. All angles, torsions, and improper torsions in MOIL are generated automatically from the bond structure. For example, any pair of bonds that share one atom defines an angle. This automated procedure is unique and saves a lot of work if an attempt is made to define all the covalent terms manually. However, sometimes MOIL makes a mistake and generates undesirable angles/torsions/improper torsions. So the file of “Modification file” is a way for the user to fix the bugs that MOIL creates. It may be possible to do a better job internally within MOIL and to catch all these special cases using computer code. However, for some esoteric systems the user may want to have the capacity to define the covalent structure differently. So the file of modification is not only a bug but also a feature. Choose “Edit”: as before you will get a complaint that the file does not exist, but if you will hang in there a little longer and click on close, an “Editor” window will open allowing you to type into the new file. Here is what I typed in (and then clicked on Save) remo angl chem HEM1 156 NA HEM1 156 FE HEM1 156 NC remo angl chem HEM1 156 NB HEM1 156 FE HEM1 156 ND *EOD The text is almost self explanatory. What we do is to remove two angles from the list of angles that MOIL keeps for this molecule. The first line reads after translating to English REMOve ANGLe using CHEMical formulas, the angle we wish to remove consists of three atoms (all belongs to the residue HEM1). The angle is defined by three sequential atoms with the iron (FE) in the center and two Heme’s nitrogens, NA and NC, flank it from behind and at the front. Why do we want to remove these angles? This a moderately long story so take a deep breath, (or skip this paragraph). The covalent structure of the heme includes four nitrogen atoms (NA, NB, NC and ND) that bind to a central iron atom. The geometry is that of a cross and the deviation from a plane are not large even for an iron with a coordination number of five. Obviously MOIL generates all the angles of bonds that share the iron atom. That is, all the angles of the type X-FE-X where X is any atom. For each angle MOIL assigns an energy term of the form: U(Θ)=k(Θ-Θ0)2 where k is the force constant, Θ is the angle between the two bonds, and Θ0 is the equilibrium value. The equilibrium value and the force constant are estimated from a set of small molecule values. However, NA-FE-NC is (or almost is) linear. This is an unusual angle for organic-like compounds 16 (which proteins are) and is not handled very well by the default energy formulation of angles that was mentioned above. In fact -dU(Θ)/dr (the forces) are singular for Θ near zero (dividing by zero). These angles are also not necessary to reproduce reasonable heme geometry. The approximately right angles (e.g. NA-FE-NB) are doing this job. We therefore remove the close-to-linear angles that can cause a lot of numerical troubles. This is what the Modification File does. Are we done yet? Nope. The last item on the agenda of computing the connectivity file is the generation of a water box. Let us generate a box of 60 angstrom around the myoglobin molecule. Go to the bottom of the processing pdb page and type “60” in each of the empty text boxes near XWBX YWBX and ZWBX, which mean the length of the X/Y/Z edge of the Water BoX, then click on Run Locally. You will see more red windows flashing on and off (there are couple of programs participating in the generation of the new system of myoglobin + water. The fact that the windows do not stop and complain is actually reassuring). At the end you should see a window “Connectivity database bla bla bla … has been successfully generated” Click on Continue. All the windows (with the exclusion of moil.tcl) disappeared. This means that we are all set (again) to go to level 2. Let us first look at the monster we just generated by going again to the View Structure subprogram. We just need to select New Config File. A new page with the names of the necessary files will show up. Click on Secondary Structure in the Display Mode to make the picture a little “brighter” and go to the bottom of the page (yes, once more) and click on View (zmoil). You will see a beautiful water box and a ribbon structure of myoglobin as shown below (if you do not, ensure that the option “Display Water” is enabled in the Display Options portion of the user interface). You can modify the picture to space filling model to make sure that the protein is reasonably well covered with (red) water molecules. If you do, you will notice that there are a few white (protein) spots visible through the layers of water molecules, but we will let it go this time, rather than re-running with even bigger box of water. 17 Let us now make a short molecular dynamics trajectory. Close the Cmoil Structure 1 window and go back to the moil.tcl main window. Click on Calculate and select Dynamics and New Config File. A new window appears (what else?!) with a lot of parameters as shown below. At the moment we will change nothing. A few of the default values are already printed in. For example #STE is the number of integration steps and it is currently set to 100. 18 A simple way of getting more information about these parameters is to place the mouse on top of their text (e.g. above #crd), (the mouse pointer will take a form of a hand), and then double click. A new window will show up with a short text explaining what this parameter is all about. For example double-clicking on #crd produces a new window with a short text as shown below. In fact the default value for dyna quoted there is not up to date. In the dynamics program dyna it is 1. 19 Note that the input files in the Dyna Information workpage are the connectivity file (*.wcon) and the coordinate file (*.crd), The connectivity file which we just generated is displayed in gray. This means that you can browse the file but you CANNOT edit it. Editing a connectivity file (unless you are a MOIL wizard) is likely to end up in a disaster, and is better avoided. Editing a crd file is also not recommended, but can be helpful sometimes if you wish (for example) to displace the carbon monoxide ligand to some other binding site. The output files are a *dcd file (which as in CHARMM contains the sequential coordinates from integration of the equations of motion), and a *dvd file, which is rarely used in MOIL and includes the velocities. The standard output provides a text report on the progress of the simulation. Anyway, let us try to cut it short and click on Run Locally. A new warning window appears claiming that “symm constraint is essential….”. It is right. I forgot to define a periodic boundary condition for the simulation, and the program wishes to check if I really want to simulate a borg ship of water and myoglobin in vacuum, or perhaps add the symmetry constraint. We can try to add the symmetry constraint “as is”. The program will use the 60 angstrom size that we used when generating the box to initiated the run. However this is not wise. Our box is not equilibrated yet which means that we are not ready yet to use the desired size of the box. Just to exemplify the problem click on yes (yes we DO want to have a symmetry constraint of a box of 60 angstrom). Windows will pop up and let you know that the program started. And in about a minute (depending on the power of your computer) another window will let you know that the program has finished. The graphical viewer ZMOIL will open, but just close it for now. You will also see a Viewer showing the text output from this last dynamics computation. Search for the pattern ENERGIES:. The information that immediately follows is a list of all the different energy terms. Note the energy “E evsym” is all stars. The stars are not good. It means that the energy exceeds the pre-determined format of MOIL. This happens when the energy is very high. In this particular case the “bad” energy is of the Lennard Jones repulsion between the primary and image boxes. The bad overlaps between the primary box and the images made the simulation unstable. It is necessary to go back and to increase the initial size of the box to 20 a size that is more acceptable energetically and slowly (linearly, and at each step) compress the box to the desired size. Close the viewer. We go back to the Calculate/Dynamics/New Config File worksheet but this time we go to the line of “Symmetry Parameters” and we click on the Display on the right side of the line. A new window will appear with a set of parameters to consider. The lines we need to touch are the first two: SYMM and SYM2. The SYMM line defines the box size that we want to start with. The SYM2 line defines the box size that we want to finish with. In the SYMM line we type “64” for XTRA, YTRA, and ZTRA, and in the SYM2 line entries XTR2, YTR2, and ZTR2 we type “60”. Then click on close. In the main dyna window type “100” on the right side of #EQU -- We will need some EQUilibration to adjust the box size. After setting the parameter #EQU click on Run Locally as we have done many times by now. The program will display two windows: One window to let you know that it is starting and a second that it ended. Once complete, the graphical viewer ZMOIL and the textoutput view will be displayed. We could explore the motion graphically at this point, but we’d like to setup some special viewing options, and you’d better learn how to bring up the viewer from scratch, so just close the viewers. Now we are back with the lone window of moil.tcl. The last accord of this far-too-long or rambling-on-and-on tutorial is a display of the results of the dynamics. Click on View Structure and choose New Config File. We wish to look at dynamics so we need to change the selection of the file. Sometimes the program already inserts the filename and the Browse is not required. However, you still need to make sure that the extension (the file type) is correct. Go to right hand side of the Coordinate File line, the file type should be DYNA (a dynamics file by default ends with dcd). If it is not DYNA, make it so. The interface recognizes the meaning of different file extensions (dcd -- dynamics, crd – coordinates, pdb – protein data bank). If you have your own original extension name you can always choose manually the type of the file from the menu on the right. Needless to say that having the correct file type defined for your run makes a difference. Ensure 2bw9_dyn.dcd, the dynamics file we just computed, is listed as Coordinate File (or different if you’ve been choosing your own names!). If not, make it so by browsing or typing the name. Note: the “swap” button next to the file type should typically be on. MOIL binary files are written in “big-endian” format, a format once common on large mainframe computers, and still used by some processors. If you are running on an Intel or AMD processor, which is likely, we need to “swap” the order of the binary bytes as they are loaded to read the data correctly. One of the parameters of the dynamics run was #crd (the frequency of saving coordinate sets). In our run it was set to one. This means that for every molecular dynamics step we have a structure that was saved to the disk. Since the total run was of 100 steps we have 100 structures. The program needs to know how many structures to expect from the file of coordinates that was obtained from dyna (it wants to know how many you would like to view, up to a maximum of the actual structures recorded to disk – if you are ever unsure, just enter a large number). Find the line Number of Structures and write in the 21 box on the right “100”. As before we will click on Secondary Structure button, which can be found in the line Display Mode. We also wish to view more clearly the ligand (carbon monoxide). Therefore we will draw it using a space filling model. We move to the line Pick for Spacefilling. Click on the Pick on the right. Scroll the middle bar and highlight the residue CMO (or CO if you changed it) (it is number 157 on the list). Click on insert and then OK. We are back in the Cmoil structure 1 worksheet. Click on the bottom of the page on View (zmoil). If the program does not display a structure go back to the Cmoil structure 1 worksheet and to the line Coordinate File. Check the box near swap. There are two ways of generating unformatted files with and without byte swapping and apparently the way the programs are set now in my computer, this box should have been checked. Press View(zmoil) again to have a look. To enjoy the dynamics (it is short, only 100 femtoseocnds, so do not expect anything dramatic to happen) go to the left to the sub-window called “Structure Animation Controls”. You may wish to click on movie mode or even more entertaining go to manual and use the diamond just below to the title. Drag the diamond with the mouse at the rate of your convenience to display the structures sequentially. Note that 100 femtoseconds are enough to see some ligand rotation (but not spatial diffusion). Ok. One more exercise due to popular demand: More water, please (or advanced swimming class) We now prepare a simulation of protein in water with long-range summation of electrostatic interactions (Ewald sum) and a series of other goodies to make our simulation more professional. We have done quite a bit of work so far, so why not build on it? Go to Calculate/Dynamics/New Config File. We will revisit some of the parameters and will find some new pits to fall into. The good news is that we will solve (eventually) all of the problems. Let us first outline a plan. Then, like any other wars the plan will survive only for the first shot. But it is still good to have a plan. We will have 3 steps: 1. Equilibrate the water, 2. Equilibrate the water and the protein, 3. A production run. Step 1: Our initial task is to equilibrate the structure of the water around the protein. The way we constructed the water box is by simple geometrical considerations. We used a pre-equilibrated pure water box of 60x60x60 angstroms, place the protein in the center, and remove all the water molecules with significant overlap with any of the protein atoms. However, we did not test for correct orientation of the water molecules and the empirical distance we used to remove the offending solvent molecules is somewhat arbitrary. For example, it is expected that the water molecules will be closer to charged residues compare to hydrophobic residues. Our initial procedure is of one-distance-for-all and there is no differentiation between amino acid types. This non-optimal solution 22 requires proper equilibration of the water molecules around the protein to obtain sensible room temperature configuration. To start with we trust (completely) the crystallographer and use the protein “as is”. Only the water will be allowed to move. OK. Let us be specific and make the concrete change on the work sheet. We start with Basic Parameters. First #ste 100 steps is a bit low for serious equilibration work. Even the shortest relaxation times of liquid configurations are in the picoseconds and not in the femtoseconds. We still want to keep it reasonably short so make it #ste 5000. The #equ is the number of equilibration steps. Since we are now doing ONLY equilibration and no data collection, the number of equilibration steps should be made equal to the number of steps. Make it #equ 5000. The number of steps between printout of information (INFO) should be set to 10. When we will be ready to launch a looooooong simulation then 10 is likely to be too much. However, in the beginning at least, it is a good practice to print frequently. It will be helpful to catch those bugs early in the day before it is getting dark and messy in complex and advanced simulations and the bugs become more elusive. Make it Info 10. The next item on the agenda is #crd which is the frequency that we save coordinate-sets to the disk (to the file 2bw9_dyn.dcd). Let it be #crd 1000 (we do not need many structures here during the equilibration, just the last structure to continue the process. The variable #vel is the frequency in which velocities are written (to the file 2Bw9_dyn.dvd). Velocities are of no use for us at the moment and we therefore will not change the default number (which should be zero – No velocities are written). The variable #lis is the frequency in which the non-bonded interactions are updated. Since we will use Ewald (and short real space cutoff distance) we should update quite frequently, which means in this case -- #lis 10. The number should be bound between 5 and 20. Less than 5 does not make sense, since the update of the lists is computationally expensive and in the less than 5 steps the system hardly moves at all. More than 20 steps is seeking trouble. In 20 steps the system can change enough so that abrupt nonconserving energies will be observed. We will not touch most of the upcoming variables, so from now on I will discuss only variables that we modify. The variable RVMX is the Van der Waals distance cutoff and we can safely set it to 8.0 instead of the default which is now at 9.0 (angstrom). The other distance cutoff that we touch is RELX (cutoff for real space electrostatic interactions) which instead of 12.0 we set to 9.0. An important thing to remember about van der Waals and electrostatic cutoff is that MOIL requires the electrostatic cutoff to be larger than the van der Waals cutoff. We cannot have (for example) the RVMX equal 12 while RELX equal 9. It does not make sense from physical viewpoint since the van der Waals interactions decay like r −6 while the electrostatic interactions decay much more slowly - r −1 . Regardless of whether it makes sense or not, MOIL is not checking for correctness in this case. However, it is using a few speed-up tricks relying on the distance hierarchy I just mentioned. It produces wrong results if you set the wrong hierarchy! So be warned, be careful, and be right. We also want to check the box near the variable HVDW; it means that the hydrogen atoms have a small (non-zero) van der Waals radius which prevent them from collapsing 23 on top of other (negative) atoms. Adding (small) hard cores to the hydrogen atoms adds to the stability of the calculations and prevents columbic “explosion” if opposite charges occupy the same position in space (something which is allowed if the van der Waals radius of the atom is zero. Zero van der waals for hydrogen atom is used in OPLS, the force field that we employ. There are also quite a few variables that look something like NOxx NOxx NOxx that reminds me of the way I talk to my son. Seriously, these parameters allow us to turn off different energy options. For example checking the box near NOEL will turn off the electrostatic interactions. These options are mostly convenient for debugging. We leave all the NOxx boxes unchecked and move on to the Symmetry Parameters. Click on Display on the right. In the newly opened window titled Symmetry Parameters for Dynamics we need to work on several boxes. First we simulate water equilibration (as before). For convenience some of the explanation are repeated. The box that we cut to begin with (60x60x60 angstrom) is likely to have hard collisions at the box boundary. We therefore start the calculation with a somewhat larger box and equilibrate it to the desired box size. The first line of the new window requires filling the boxes for XTRA YTRA and ZTRA. These are the parameters for the box sizes we begin with. The starting box should be larger by several angstroms compared to the desired box size, so we use 64 for all three. In the line below with the entries XTR2 YTR2 and ZTR2 we put the final desired size of the box (60) three times. This is exactly the same as we did before. And here comes the exciting new part. To allow for Ewald summation check the box near EWAL and make DTOL 0.0001 (allowed error parameters). It is also good to set GRDX GRDY GRDZ (the grid sizes) to 32 each. This is it. We are done with the Symmetry part. Close the Symmetry window. We are back at the Dyna Information sheet continuing to our next adventure at Constraint Parameters. Click on display at the right side of Constraint Parameters. There are two types of constraints that we wish to turn on. The first requires a click near MSHK (the line of Matrix SHaKe). It makes the water molecule TIP3 rigid. Since TIP3 was designed as rigid (by W. Jorgensen), it makes sense to keep it so. The second constraint that we wish to use is of NFRoZen. It picks up the particles that will NOT be frozen in the upcoming simulation. In our case we should select (using the Pick command on the right) the TIP3 molecules. The rest of the particles (i.e. the proteins) will be frozen in their initial configuration allowing relaxation of the water molecules (only). Click on Edit on the right, in the new window that appears scroll the bar in the middle until the name TIP3 shows up in the list of monomer names. Click on one of the TIP3 (does not matter which) and then click on Insert in the center. In the selection line below we will see the line “chem mono TIP3”. Click on OK. You have returned back to the Constraint Parameters window and the selection has been transferred to the right of NFRoZen. Click on Close. We are back at the Dyna Information window. Click on Run Locally. You will receive a message that the program is starting and immediately an error message that states “Failed to run… “. This should not really surprise us given our past experience. Let us see if we can figure out what the problem is. Click on View & Save. A new window shows up 24 titled “Saved ErrMsg File”. Use the right side bar to scroll down to the end of the file. There you will find the message “The charge of the system is 1.000000…”. Followed by a RED ALERT and a suggestion Add counterion and try again. Ouch, here is something we indeed need to fix. The Ewald summation of electrostatic energies is doing so to infinite distance, duplicating by symmetry the basic water box. This is an approximation since in real solutions of water and proteins the symmetry is obviously lower. Nevertheless, it is one of the better approximations we have at present. However, if the box is charged then the summation over boxes created by the symmetry operation will make the system macroscopically charged. Clearly an incorrect modeling, we require that the system be neutral. The protein at hand has a net charge of +1 electronic charge. What we need to do at present is to add a counter ion and make sure that the box is electrically neutral. Where should we put the ion? The solution will be equilibrated as discussed above (with an ion or not), so there is no need for high accuracy in placing the ion (unless there are specific binding properties for the ion that require more detailed modeling). Since no positively charged ion was seen in crystallography, we assume it is a mobile ion and can be placed anywhere in the box. The way we will do it is by replacing one of the many water molecules by a chloride ion. We need to make changes in two files: in the coordinate file (*crd) and in the connectivity file (*wcon) file. Lucky for us moil is reasonably well equipped to deal with ions and the required modification of coordinate files. We return to the main menu of moil and select Assemble/Build Connectivity/Reload Config File . Assuming that you are in the same directory that we worked in before you can identify a file 2BW9_con.in, highlight it with the mouse and select OK. A Connectivity File information worksheet is now open. Since we reloaded the old file a lot of entries were refilled already saving us time and aggravation. At the third file line (the line that says Polymer File) there is a button on the far right that says Add Ions. It looks exactly what we need - Click. We’ll replace a water molecule with a choride ion, so check the box near Cl to use this type of ion. You then need to choose the water molecule that will be replaced. The simplest is to use the side bar to scroll to the end of the list and pick the number of the last water molecule (should be number 5753, but this can change when the PDB structure is updated). Note: if you choose the water molecule first and then choose the Cl ion, you may find that the water molecule has become deselected in the list; therefore, choose the ion first… Finally, make sure that the crd file to update is named as our previously generated 2BW9.crd, if it was not there already, and then click on Save. You will receive a notice that one water molecule was changed on your behalf. If instead it says no ions were added, go back and try again, and ensure when you click Save that both a water (number) from the list, and an ion type is selected, and that a valid filename is present. Return to the Connectivity File Information (it should have remained open). To improve the generation of this connectivity file we examine the section Configuration Parameters. Check the box near MSHK. It signals to the code to keep the distances within a water molecule fixed (the water molecule is treated like a rigid object). From the perspective of the connectivity file (*wcon) it means that no internal bonds of water molecules should 25 be included (they are kept rigid by geometric, not energetic constraints) which makes the later calculations of the dynamics more efficient. Then check also the box near HVDW. We have checked an identical box also in the Dynamics Information work sheet. It is actually more important to check it here. The connectivity file is where the information on the van der Waals radii of all the atoms is stored. It also generates appropriate 1-4 interaction terms that include the HVDW adjustment. After these two small adjustments we can hit the Run Locally button. The connectivity file should be generated successfully (and you should receive message to that effect on a separate small window). Click on continue in the Information window. We now return to the MOIL main menu and we are ready to try Dynamics again. A comment: Obviously we would not have had to go through this trial and error series if we had been better informed. For example, if we knew that the system we prepared has one more extra charge, we could have skipped the futile attempt at running dynamics (which we made before) and immediately after Processing PDB continued to generate new connectivity and coordinate files with the ions inserted. However, (and this is a BIG “however”), we rarely know the charge of a protein unless we dig into it already (a clever programmer could have done this as an internal MOIL check, something to think about in the future…). In this case it may be easier (within MOIL) to let the program find the net charge for you and then fix the connectivity file and add ions to the coordinate file. So the path we took up to this point is actually typical. Select from the main menu Calculate/Dynamics/Reload Config File. We can use the older file since it was ready to use once we fixed the overall charge of the system. There is no need for extra practice of your typing and mouse clicks on this page anymore. Select in the file window 2BW9_dyn.in and click OK. The Dynamic Information page appears again. Make sure that all the variables that we inserted before are still there (you can check them by looking back in this document). If not, insert them. There is one additional item that we need to take care of after adding the ions, which belongs to the Constraint Parameters. Previously we froze everything with the exception of the water molecules (i.e. we froze all the protein atoms). Now we have a chloride ion which should be mobile too. So our selection of which particles NOT to freeze must be modified. Click on Display on the right hand side of the Constraint Parameters line. A new window shows up. Go to the NFRoZen line and click on Edit. When the Pick Lists window opens click on Clear at the bottom. Then drag the middle bar to the end of the file. Highlight the monomer CL and then click on Insert. Then click on the button at the left to OR and finally Highlight a TIP3 and click on Insert again. The line of selection should have the text chem mono CL | chem mono TIP3 . Click on OK. Back in the Constraint Parameters … window we realize that now both types of monomers CL and TIP3 are not frozen in the simulations which is what we wanted (you should see this indicated in the NFRoZen: PICK line) Click on Close. We return to the Dyna Information window. Are you ready for the big moment? Ready or not click Run Locally. A window shows us and informs you that the program is running and suggests that you should click on Continue. Do it and go get a cup of coffee. First, you deserve it. Second it will take some 26 time to execute the 5,000 steps we requested. [on an intel 2Ghz macbook running the windows version of the software, this computation took about 30 minutes] While we are waiting I can tell you a secret that all the instructions you made are stored in a file 2BW9_dyn.in . In general the graphic interface of MOIL saves input files with the extension *.in It is useful sometimes to have a look to make sure you were understood correctly and to avoid some unpleasant surprises. If you are running on Windows (like I am doing now to diversify my CTRL-ALT-DELETE and ALTCOMMAND-ESC abilities) you can watch the progress of the dyna program by pressing CTRL-ALT-DELETE and click on Processes in the Windows Task Manager window. One of the processes should be dyna.exe. If all goes well it should consume a significant fraction of your CPU time. In fact you can use these input files to run command from the keyboard (e.g. run dyna as “dyna < 2BW9_dyn.in > 2BW9_dyn.out”). Running from keyboard and text commands are particularly useful when preparing scripts to launch a large number of jobs, but are beyond the scope of the present tutorial [running programs from the command line requires that the folder containing the executables are in your execution path; adding folders to the path is discussed at the top of this tutorial]. Ok, the job is over. Zmoil window just showed up and invited you to free graphic entertainment. Close that window down, we are here to do some serious stuff and have no time for further distractions. We need to extract the last structure from the dynamics file (*dcd) and to make it ready it for yet another Dynamics run in which the protein will be allowed to move (finally). In the Main window select Analyze/Convert Coordinates/New Config File. A new window is open titled Convert Coordinates. We need to make a better choice of the Input Coordinate file. Click on Browse and highlight the file 2BW9_dyn.dcd then press OK. Under Parameters two numbers are requested. LPST is the index of the starting structure and LPEN is the index of the final structure. In a *dcd file we are keeping multiple coordinates sets that were generated during the simulations. The number of structures is determined by the number of total integration steps (#ste) divided by the frequency of saving the coordinate set to the disk (#crd). In our case #ste is equal to 5,000 while #crd to 1,000. Hence we expect 5 structures in the 2BW9_dyn.dcd file. We wish to extract the last structure (the best equilibrated) and to use it to continue the simulations. Therefore we set both LPST and LPEN to 5. Continuing to the Output Coordinate File a line below, we need to be a little creative and invent a new name for the file with equilibrated structure. Let it be 2Bw9_eq.crd which is what we should type in. The rest can be left “as is”. Click on Run Locally. A happy report states that ccrd performed WITHOUT red alert. Step 2: Our goal is to equilibrate the protein structure together with the water. We equilibrated the water molecules to begin with in Step 1 since the water molecules were far from correctly equilibrated. We move on to equilibrate the structure of the protein together with the water. We expect the initial structure to be pretty good (determined by crystallography and refined with an energy function which is similar in spirit to the one in MOIL). So this should be a more gentle calculation. We will run a simulation in which both the protein and the water are allowed to move and we will slowly heat up the protein from 1K to 300K. Note that there is no need for minimization here. The low temperature 27 run will take care of bad contacts. In most cases if the initial structure is of reasonable quality minimization is not needed. We now return to Calculate/Dynamics/Reload Config File. Highlight 2BW9_dyn.in in the File Open and click OK. The Dyna Information window appears. The selections we have done previously are there, but we will need to work through some of them and make changes. The Coordinate File in the second line should be changed to the equilibrated structure 2BW9_eq.crd. We will make only one change in the Basic Parameters. Go to the MULtiple Temperature. Make sure that the entry to #TMP is 1. Go one line lower and insert in the box on the right of TMPI 1, and in the box right to TMPF 300. The initial temperature will be 1K and the final temperature at the end of 5000 equilibration steps will be 300K. The temperature will increase linearly in small steps (each step) by velocity scaling throughout the simulation. Next we will edit the Symmetry Parameters, go to the right of that line and click on Display. Change the entry to XTRA YTRA and ZTRA to 60 and erase the entries to XTR2 YTR2 and ZTR2. The box is relaxed at its correct size and it is now fixed. The rest of the parameters are left as are (Ewald box is checked, DTOL 0.0001 and GRDX, GRDY, GRDZ (the grid sizes used by Particle Meshed Ewald for long-range electrostatic calculations) are set to 32. Click OK. We move on to the Constraint Parameters and click on Display on the right. A window of Constraint Parameters showed up. The present simulation is with the protein. We will fix the length of all the bonds in the protein by checking the box on the right of SHKB in the first line. We also make sure that MSHK is checked (it should be from the previous setting of the file). Erase all text from the rest of the worksheet. There is still something we need to do. If we go ahead and do the simulation as is, the protein will diffuse in the water box executing overall diffusive motion moving away its center of mass from the center of the water box. It can continue to the extreme (and many of our simulations are long enough to see it) in which the protein simply diffuses out of the water box, which is not good. We therefore simulate the protein with constraints on its center of mass position and overall orientation of the protein molecule (the last is convenient in visualization). These constraints should not affect the internal dynamics of the protein molecule which is what we are interested in. We will impose the center of mass and orientation constraint with the ORIEnt option. (Other option of fixing the center of mass of the protein to the center of the water box is to use the Tether option, adding a spring to the center of mass of the protein. In this simulation we will go (however) with ORIEnt). ORIEnt will not work if the protein changes its shape greatly during the calculations (e.g. in protein folding) since an assumption is made that the fluctuations of its structure compared to the original structure are small. If the fluctuations are large Tether is a better option than ORIEnt. Click on the Edit on the right side of ORIent. It is not necessary to select all protein atoms to fix the protein in the box and the simplest and cheapest solution is to apply this constraint on the alpha carbons (CA) of the protein. Select (Highlight) any single CA atom that you see on the right of the new window of Pick Lists. Then click Insert. In the selection line you will 28 now see chem prtc CA which means we have picked any particle (atom) with name “CA”. Click OK. In the Constraint Parameters window click on close. In the Dynamic Information window click on Run Locally. This is another run that will take a while. Be patient (and one cup of coffee should have been enough). Step 3: Production run. By now the run has finished. The next phase is actually quite similar to Step 2. We extract the last structure of the equilibration run and use it to initiate a run of moving protein and water now at 300K that we can analyze in many ways. This trajectory should be representative to the way the protein fluctuates thermally in nature (so we hope). Select Analyze/Convert Coordinates/Reload Config File. In the new File Open window highlight 2BW9_ccr.in and click OK. The window of Convert Coordinates opens. We can run this file “as is”. Click on Run Locally. We overwrite the older equilibrated structure (the one in which we allowed only the water to move) but this is fine since we do not need that file anymore). We get a familiar report about the success of the ccrd program. We go back to Calculate/Dynamics/Reload Config File and highlight the file 2BW9_dyn.in in the File Open window. As always click OK. In Dyna Information change the number of integration steps (#ste) to 20000. We are going to make a more serious run now in which the total number of steps will be 20,000 and the number of equilibration steps will be kept at 5,000. We also are going now a little lower and changing TMPI from 1 to 300. The run is now at a constant temperature of 300 and not gradually increasing the temperature from the cold out of outer space (1K) to the warmth of room temperature (300K). The equilibration period means re-scaling of the velocity to kinetic energy that corresponds to 300K. Click on Run Locally. This is a short set of instruction but the run will take a considerably longer time. Use zmoil to watch the set of structures stored in the *dcd file. Find the display option that will allow you to present the carbon monoxide molecule (only) as two space filling spheres. It is evident that during our short simulation the ligand just bounces a little in the heme pocket. One of the puzzles in the function of this protein (myoglobin) is the way the ligand enters (or leaves) the binding site from the solvent. Drawing the protein with space filling model illustrates that there is no obvious way for the ligand to enter the binding site and the protein must execute thermal fluctuations to allow for this event to happen. Experimentally the escape events require tens to hundreds of nanoseconds to occur (10-7 seconds). The simulation we just run was for picoseconds (10-12 seconds). It is therefore unlikely that we will succeed to probe these events in straightforward way. This underlines the need for significant increase in computer power and/or in computational methods to address long time phenomena in biophysics. Our group is at the front of this research and computational methods that are unique to moil or were developed under the umbrella of this program address exactly these problems (calculation of reaction coordinates with action optimization -- codes: chmin and sdp, the LES methodology – codes: everywhere, and the Milestoning approach – codes: fp and memeqns). In the next set of blogs I will illustrate the use of some of these techniques. 29
© Copyright 2026 Paperzz