Protein Structural Alignment Servers A plugin developed using TM-Score alignment for use with PyMOL open-source molecular visualization system By: Michael Chang and Alex Koszycki [email protected] [email protected] Drexel University BMES 546: Biocomputational Languages Professor Ahmet Saçan, Ph.D. December 2013 Chang, Koszycki 2 I. Abstract This report will discuss the development and function of a protein alignment plugin for the PyMOL molecular visualization system. The plugin utilizes the TM-Score algorithm, available on a publicly accessible server, to obtain a rotation translation matrix. The matrix is then used to transform a protein and display the aligned results in PyMOL. This paper will first introduce protein alignment and the problem statement, discuss the objective of the plugin, and explain the methods by which it operates. The results of the plugin on a test case show ideal alignment, as compared to the TM-Score server’s Jmol representation. Lastly, the challenges of the development process and potential future work will be discussed. II. Introduction Protein structural alignment is used widely in investigative research, drug development, and biochemical engineering. The similarities between proteins can shed light on many aspects of origin and function. Similar structure may indicate that proteins belong to the same family, or have developed from a common ancestor. More importantly, the function of an unknown protein can be investigated by comparing its structures to proteins of known biological function. The structure of a protein can predict intermolecular forces, binding affinities, possible substrates, and the role of the protein in a living cell. For these reasons, many methods of computational solving for protein similarity have been developed. Many methods utilize a root-mean-square deviation (RMSD) between central Carbon atoms of amino acids in alignment processing (Carugo and Pongor 1470-73). Using RMSD as a basis, many individuals have improved on the algorithm and accounted for factors that decrease its effectiveness in application. Two such individuals, Drs. Yang Zhang and Jeffrey Skolnick, developed many methods and have made them available online on servers for public use. One of these is TM-Align, a method that uses TM-Score translation rotation matrices (Zhang and Skolnick 2302-09). The TM-Score weighs close atom pairs stronger than distant pairs, and thus is more sensitive to topology and fold than the RMSD methods. This makes it particularly useful in investigating function of even very diverse proteins. In the investigation of protein structure, researchers often rely on molecular visualization tools. One of the most common of these tools is an open source platform created by Dr. Warren Lyford DeLano known as PyMOL (DeLano 44-53). PyMOL extends the Python program language, and is extendable in turn by Python, allowing the molecular visualization and crystallography community to create custom plugins for their unique applications of the system. Figure 1: Ubiquitin displayed in PyMOL. Chang, Koszycki 3 III. Objective The objective of this project is to create a plugin for PyMOL that will take two protein data bank codes via user input and plot their corresponding structurally aligned proteins in PyMOL. The goal is to create an easily accessible tool for users, which may be accessed in the same way many functions may be accessed in PyMOL. For this reason, it will not be necessary for the user to download protein data bank files (*.pdb). However, these will be utilized by the server to calculate TM-Score, and by the plugin to display the resultant protein alignment. IV. Methods The designed plugin will take in two protein data bank (PDB) codes, which represent individual proteins, and display the protein alignment of the proteins in PyMOL. Before any actions can be taken, the plugin framework was defined using a self-initialization function, which will be called during PyMOL’s startup routine. This function adds a menu bar item to the Plugin menu called Protein Alignment. The user can activate this item to initiate the plugin. When this is done, the first plugin specific function is called, fetchPDBDialog(). This uses the tkSimpleDialog library to get user input. The plugin needs to obtain information about the proteins it needs to rotate. This is text information that consists of a unique four-letter code known as the PDB accession code, or simply PDB code. Figure 2 shows the two methods by which the PDB code may be input. In Figure 2a, the user selects the plugin from the PyMOL Plugin menu. This activates a query box using tkSimpleDialog library askstring(), that asks for a PDB code. The user inputs the code and presses ‘OK’, and is prompted to enter another code. In Figure 2b, the user simply types the function command ‘protalign’ followed by the two protein names, separated by commas. This function is created using the PyMOL extend() capability, which allows the plugin to extend seamlessly to the PyMOL environment. Figure 2(a): User input via plugin menu, and (b) via PyMOL command line function. The code is then put in a URL request to the Research Collaboratory for Structural Bioinformatics Protein Data Bank website, at rcsb.org. This request is actually an urllib2 urlopen() function that manually opens the download page for the specific PDB file. The data is decompressed using zlib library decompress() to obtain the files that correspond to the PDB codes. The files are used by the PyMOL function read_pdbstr() to read in and plot the corresponding protein from the file. During this process, the first protein is selected using the PyMOL select() function. This selection is important later on, when the protein is transformed and rotated. Figure 3 shows the flow of data through the plugin. After fetchPDBDialog() obtains the protein information, it is uploaded to TM-Score. TM-Score outputs a rotation translation matrix. This matrix is used to transform the first protein, and the results are displayed in PyMOL. Chang, Koszycki 4 Figure 3: Process flowchart for the protein alignment plugin. Protein data is obtained from user input, and files are sent to TM-Score, resulting in a rotation translation matrix. This is used to rotate the first protein and display in PyMOL. Figure 4: Data Structure utilization in the PyMOL plugin. Up until this point, the plugin has only functioned so far as to obtain the PDB codes and displayed the unaltered proteins. Figure 4 shows these inputs as the .pdb files, and how the plugin uses data structures. The next step is to interact with the TM-Score server. This part of the plugin is defined as the second function of our design, named simply plugin(pdbCode1, pdbCode2). First, a string URL is defined to the result page of the Zhang alignment server (http://zhanglab.ccmb.med.umich.edu/cgi-bin/TMscore.pl). The PDB file data is stored in a Chang, Koszycki 5 Python dictionary called values with ‘pdb01’ and ‘pdb02’ as keys. This isn’t shown in the figure because it is only used for convenience in calling a URL request using the urllib2 library Request() function. The source code of the response page is then read in by using urlopen() and read(). Next, the find() function is used to search the source code for the rotation translation matrix, which is displayed on the page as shown in Figure 5. The split() function is used to split up the values of the matrix and save them individually. The t(i) column shown in Figure 5 represents the translation vector, while the 3x3 u columns represent the rotation matrix. Figure 5: Rotation translation matrix displayed by TM-Score server. These numbers are then input into the PyMOL function transform_selection(). This function transforms and rotates a selection based on a matrix like the one output by the server, though rearranged slightly. The matrix is input as a Python list, and consists of a 4x4 matrix in the format shown in Figure 6. The selection that is input to this function is the first protein, which had been selected earlier in the code using the select(‘all’) command, which is more convenient than a specifically targeted command. The default name of a selection in PyMOL is ‘(sele)’, and this is input to transform_selection(). The function has many more input values: state, log, homogenous, and transpose, governing mysterious aspects of its function. In this plugin, it was found that setting all of these to 0 resulted in the correct functioning. u(1,1) u(1,2) u(1,3) t(1) u(2,1) u(2,2) u(2,3) t(2) u(3,1) u(3,2) u(3,3) t(3) 0.0 0.0 0.0 1.0 Figure 6: Rotation translation matrix input to transform selection as float values. u represents the rotation matrix while t represents the translation vector. In this notation, the t vector actually represents pre-rotation translation, and the three 0.0 values along the bottom represents postrotation translation. However, transform_selection() compensates for post translation internally. Lastly, the plugin cleared the selection using the PyMOL delete() function. This did not delete any proteins or information; it simply cleared the selection so that it was not in the user’s way. The PyMOL zoom(‘all’) command was used to center the camera on the fully aligned protein, and the show(‘cartoon’) command was called to show cartoon representations of secondary structures. This visualization helps to clearly show the aligned structures. Finally, the entire plugin(pdbCode1, pdbCode2) function was extended to the PyMOL environment using extend(), and defined using the term protalign. Chang, Koszycki 6 V. Results The results of the plugin with our test cases were successfully representative of the TMScore alignment. Figure 7 shows the alignment of PDB code 1V07: THRE11VAL Mutant MiniHemoglobin and PDB code 1HBI: Oxygenated Scapharca Dimeric Hemoglobin. Figure 7a shows the output of the plugin, and 7b shows the Jmol preview prepared by the TM-Score server. As can be seen, the alignment is identical. Figure 7: Results of (a) our Protein Alignment PyMOL plugin and (b) TM-Score server. VI. Conclusions Although the plugin was successful in using the TM-Score server to align two proteins in PyMOL, there were many aspects of the development process that proved challenging. Most notably of these was the lack of comprehensive documentation for extension of the PyMOL coding environment, known as the application programming interface (API). Although there were some very useful resources on which we relied, such as the PyMOL wiki and the PyMOL mailing list archives, much of the nitty-gritty was omitted in descriptions and examples (PyMOLWiki, PyMOL-Users Mail Archive). This is primarily due to the fact that the documentation largely focuses on the user-system interaction component of PyMOL and not API-extension. As a result, we undertook much research during development, and looked to dissimilar previously created plugins for examples. Then, it took guessing and testing to elucidate how a function actually behaves. The transform_selection() function is an example of one such poorly documented case. Chang, Koszycki 7 The debugging of the plugin also proved to be difficult, and for the majority of the process a critical error simply resulted in PyMOL (a) not initializing the plugin (b) not showing any change at all, or (c) crashing forthwith. In each of these cases, it is impossible to debug the source of the error since the plugin did not interact correctly with PyMOL. Debugging tools were investigated, but ultimately proved unwieldy or expensive. We resorted to logical methods of investigation, such as segmenting the code into small modular components and testing each individually to zero-in on errors. Once critical errors were accounted for, PyMOL would return to some meager debugging information, such as the line number at which a process failed. At this stage it was easier to investigate errors, though no less easy to test changes. After every change in the code, it was necessary to uninstall, reinstall, restart PyMOL, and rerun the plugin. Experimenting with certain libraries and their interaction with PyMOL was difficult in setting up the coding environment as well. In the end, we minimized these issues by relying on more pre-installed and accessible packages in the plugin. For example, instead of using Poster, the urllib and urllib2 libraries were used to communicate with the server. However, it is satisfying to see the working function behave without a hitch. It has a lot of accessibility and ease of use for comparing protein alignment. There are a few improvements that could be made to the code in the future, given more time. One of these would be to handle errors in input more elegantly. The query box version of user input does this well, but the command line tool does not allow for errors in input, and causes PyMOL to quit unexpectedly. It is unclear why this is so, and further investigation would be necessary to remedy it. Another possible improvement for the plugin would be to code it for variable input of proteins, and align each individually to the first protein input. This would require some structural rearrangement of the plugin, and possibly modularization of certain tasks to smaller functions. In summation, the protein alignment plugin was a challenging yet rewarding project, and functions well as an extension of the PyMOL API. Using a user input of two PDB codes, the plugin obtains .pdb files, sends them to the TM-Score server, and retrieves the resultant rotation translation matrix. This matrix is used to transform the first protein and plot the aligned structures in PyMOL. From there, it may be manipulated by PyMOL’s user interface for further investigation. VII. References Carugo, Oliviero, and Sándor Pongor. "A normalized root-mean-square distance for comparing protein three-dimensional structures." Protein Science. 10.7 (2008): 1470-73. Web. 9 Dec. 2013. <http://onlinelibrary.wiley.com/doi/10.1110/ps.690101/full>. DeLano, Warren. " PyMOL: An Open-Source Molecular Graphics Tool." CCP4 Newsletter on Protein Crystallography. 40. (2002): 44-53. Web. 9 Dec. 2013. <http://www.ccp4.ac.uk/newsletters/newsletter40.pdf PyMOL-Users Mail Archive. < https://www.mail-archive.com/[email protected]/info.html>. PyMOLWiki. <http://pymolwiki.org/index.php/Main_Page>. Zhang, Yang, and Jeffrey Skolnick. “TM-align: a protein structure alignment algorithm based on the TM-score.” Nucleic Acids Research. 33.7 (2005): 2302-09. Web. 9 Dec. 2013. < http://nar.oxfordjournals.org/content/33/7/2302.short>
© Copyright 2026 Paperzz