B RIEFINGS IN BIOINF ORMATICS . VOL 10. NO 3. 247^258 Advance Access publication March 30, 2009 doi:10.1093/bib/bbp013 2D molecular graphics: a flattened world of chemistry and biology Peng Zhou and Zhicai Shang Submitted: 31st December 2008; Received (in revised form) : 17th February 2009 Abstract Molecular graphics provides an intuitive way for representation, modeling and analysis of complex chemical and biological systems. It is now widely used in the theoretical chemistry, structural biology, molecular modeling and drug design communities. Traditional molecular graphics techniques mainly dedicate to showing molecular architectures at three-dimensional (3D) level. However, in some occasions the two-dimensional (2D) representation of molecular configurations, profiles, behaviors and interactions may be more readily acceptable for audiences, especially when we need to describe abstract information in a straightforward way or to present numerous data in schematic diagrams. In recent years, 2D representation methods/tools have been developed rapidly for various purposes, ranging from the aesthetic depiction of atomic arrangement for small organic molecules to schematic layout of complicated nonbonding network across the biomolecular binding interfaces, and have received considerable interest in the fields of chemistry, biology and medicine. In this article we first propose the term of 2D molecular graphics to cover the spectrum of 2D representing chemical and biological systems, we also give a comprehensive review on the methods, tools and applications of 2D molecular graphics. Keywords: 2D molecular graphics; schematic diagram; chemical and biological system; bioinformatics INTRODUCTION Graphic language, symbolic language and written language represent the human communication ways at different abstract levels, in which the graphic language presents information in an intuitive way and, apparently, is the most understandable manner for audiences—‘a picture is worth a thousand words’ [1]. In the fields of chemistry, biology and medicine, molecular graphics is widely used for study and dissemination of molecular structures and functions. It is assumed that microscopic world could be visualized on computer screen using a dozen of metaphors, such as the use of spheres to represent atoms or ribbons to represent protein chains, because we are generating synthetic images of objects that are far smaller than the wavelength of light [2]. Traditionally, the subject of ‘molecular graphics’ is always associated with the stereo representation of molecular architectures in three-dimensional (3D) space by using various models such as CPK, BALL & STICK, and SURFACE, etc. [3]. In recent years, however, such a viewpoint has been changed with the emergence of a number of methods/tools which dedicate to schematically depicting chemical and biological information on two-dimensional (2D) diagrams. These simplified diagrams are aesthetic and straightforward for the exhibition of molecular configurations, profiles, behaviors and interactions. Therefore, this area is considered as a branch of molecular graphics, and here we call it 2D molecular graphics. In fact, 2D and 3D visualization methods are complementary. They are used for different purposes and/or different occasions. For example, a rotatable molecular stereograph provides a faithful representation of available data but requires considerable operator time and practice to thoroughly perceive and is often an inconvenient way to communicate information among peers. A flattened Corresponding author. Zhicai Shang, Department of Chemistry, Zhejiang University, Hangzhou 310027, China. Tel: þ86 571 87952379; Fax: þ86 571 87951895; E-mail: [email protected] Peng Zhou is a PhD student at the Institute of Molecular Design & Molecular Thermodynamics, Department of Chemistry, Zhejiang University, China. His research focus is in graphic design and analysis of chemical and biological systems. Zhicai Shang is Professor of Department of Chemistry at the Zhejiang University, China. His research interests include the computeraided molecule design, bioinformatics and quantitative structure–activity relationship. ß The Author 2009. Published by Oxford University Press. For Permissions, please email: [email protected] 248 Zhou and Shang image can be designed to be very easily understood even at a glance and can be readily shared as a printed hardcopy. OVERVIEW OF 2D MOLECULAR GRAPHICS Computer is a powerful tool to implement graphical operations and its development has had a dramatic impact upon 2D molecular graphics. It should always be remembered, however, that there is much more to 2D molecular graphics than the naissance of computer. The history of 2D molecular graphics could be traced back to the proposal of benzene’s structure by Kekule [4]. After that, the chemical structure became the most frequently used concept in chemistry and the related fields. In recent decades, 2D molecular graphics was further developed and applied in molecular biology. For example, protein topology diagram and RNA secondary structure diagram give a clear expression of the arrangement of secondary structure elements and functionally structural domains; the macromolecular surface map can intuitively reflect the distribution situation of residues on a protein surface. Nowadays, 2D molecular graphics is also a useful tool in computerassisted drug design; pharmacologists can easily identity and conveniently analyze pharmacophore and nonbinding interaction of receptor–ligand complexes by using diagrammatic analysis applications. These works suggest the 2D molecular graphics is a promising branch of molecular graphics and will be further developed rapidly in future. Here we categorize 2D molecular graphics into three aspects in terms of the research object and purpose: (i) Generation and drawing of chemical structures. (ii) Representation and annotation of biomolecules (proteins and nucleic acids). (iii) Schematic layout of molecular interactions. CLASSIFIED DISCUSSION OF 2D MOLECULAR GRAPHICS Generation and drawing of chemical structures Generating 2D chemical structures from molecular formulas, connection tables or spatial coordinates belongs to the earliest application of computer in chemistry. Nowadays, graphical drawing of chemical structures is a relatively mature technique, readily available in many commercial and academic software such as ChemDraw [5], Symyx/Draw (formerly named ISIS/Draw) [6], OGHAM [7], CACTVS [8], etc. The demand for depiction of molecular structures is great, because the stylistic conventions of such pictograms constitute a ‘natural language’ for chemists which can be rapidly comprehended, and is thus highly suitable for communication. Fortunately, chemical structures constitute a limited subset of all possible graphs. If one considers only organic chemistry, there can be formulated a relatively small set of rules which cover most of the common environments in which an atom type might occur. Once these rules are stated, the structure can be pieced together and an overall depiction obtained [9]. Correct, aesthetic and understandable are the basic requirement for generating chemical structures. In the early stage, researchers mainly adopted some simple constraints to generate chemical structures. First attempt by Zimmerman and Feldman [10, 11] used a library of basic substructures and combined them according to a set of rules. Although this method is particularly effective for drawing edgefused polycyclic ring systems such as steroids, it is limited in its generality by the size and nature of both the template set and the rule set, with the most difficult cases being bridged polycyclic systems. In the same time or later, several methods were proposed for the similar purpose. However these methods are all incapable of handling complex molecules and thus do not to be applied widely [12–14]. After 1980, intelligent algorithms were introduced into this field, obtaining a good effect on both computational efficiency and scale. Shelley [15] presented a heuristic approach to avoid overlapping atoms and bonds and to facilitate similar representations of the structural diagram for identical or similar structures. Hibbert [16] employed genetic algorithm to generate and display chemical structures, in which the atomic coordinates or bond angles are encoded into chromosome and the distances between all pairs of bonded and nonbonded atoms are served as the fitness function. The DEPICT program written by Weininger [17] was designed to convert SMILES format, the linear notation of molecular graph, into a depiction of molecular structure. The output display allows all aspects of SMILES representation of structure to be verified easily, including aromaticity, formal charge, bond order assignment and hydrogen attachment. A comprehensive review on the research status 2D molecular graphics 249 Figure 1: Some examples of 2D chemical structures generated by different algorithms. (a) Organometallic complex (ChemDraw [5]). (b) Boride (CACTVS [8]). (c) Carbon nano-tube (Clark’s method [9]). (d) Cage-shaped carbon molecule (OGHAM [7]). of 1980s and 1990s for structure diagram generation can be found in Ref. [18]. After 2000, a number of methods and programs were proposed in elaborate consideration to generate, depict and edit 2D molecular structures. Their application scope covers not only the common organic compounds but also some special cases such as organometallic complexes, borides, carbon nanotubes, cage-shaped carbon molecules, etc. (Figure 1). Boissonnat et al. [19] firstly defined the concept of molecular family that was a set of molecules sharing a connected supertree. Using the correspondences between atoms provided by the supertree, the drawing performing of molecular graphs can be accurately coordinated by a breadth-first traversal approach. Fricker et al. [20] further proposed an algorithm that was based on the classical scheme of a drawing queue placing the molecular fragments in a sequential way. This method extended the traditional prefabricated units developed for complex ring systems to automatically created drawing units for chains and rings which will then be assembled in a sequential manner. Recently, Clark and co-workers [9] have stated the two goals of a successful structure depiction method: (i) to maximize the fraction of molecular connection graphs that are readily discernible to the observer and (ii) to maximize the fraction of graphs which are aesthetically ideal and functionally equivalent to the efforts of a skilled artist. In addition, two works are deserved to be mentioned: (i) one is the chiral depiction algorithm by Maehr [21] who used the targeted redeployment of established stereobonds to serve as the stereodescriptors that can differentiate between enantiopure and racemic compounds and those that are enantiopure but whose chirality sense is unknown; (ii) another is the small-molecule topology generator PRODRG [22] which takes input from given coordinates or various 2D formats and automatically generates molecular topologies suitable for X-ray refinement of protein–ligand complexes. On the other hand, two java-based programs, JChemPaint [23] and MCDL editor [24], were published for generating and editing chemical structures. They are different from previous structure-generating programs in that their source codes are shared and collaboratively developed through the Internet, and these characteristics lead to that the open source programs can be rapidly spread and widely applied in the science community. Representation and annotation of proteins Protein topology representation Protein topology was originally investigated systemically by Crippen [25, 26] and gradually becoming a common concept of protein structures. 2D representation of protein topology structures is a useful tool in protein searching [27] and fold comparison [28, 29]. However, the main application of protein topology representation is, of course, to straightforwardly show the structural pattern and spatial arrangement of protein architectures. Protein topology representations can be classified into two styles: cartoon diagram and topology diagram. Cartoon diagram utilizes various symbols such as circles, squares, triangles and lines to represent protein’s substructures (Figure 2a), which was first adopted by Schulz et al. [30] and Rossmann et al. [31]. Levitt and Chothia [32] used a similar way to illustrate the arrangement of -helices and -sheets in 31 globular proteins with known conformations. Based upon the representation results, they arranged protein structures into four classes that are well known today, i.e. all- proteins, all- proteins, þ proteins and / proteins. Cartoon diagrams were later drawn manually in some publications to elucidate the arrangement fashion of protein 250 Zhou and Shang Figure 2: Stereoview of the Bacillus subtilis DppA monomer (center) (PDB entry: 1hi9). This figure was produced using PyMOL [50]. (a) Cartoon diagram produced using TOPS [38]. (b) Topology diagram obtained from PDBsum database [51]. structural elements [33, 34]. Flores et al. [35] have described a general algorithm for automatically generating protein cartoon diagrams, which was further improved by Westhead et al. [36, 37] and integrated into the TOPS database [38]. Recently, May et al. [39] proposed a modified version of protein cartoon diagram called PTGL in which the secondary structure units were notated as small circles and quadrates and the spatial neighbourhoods were represented as arcs between these units. Topology diagram consists of parallel/anti-parallel arrows (for strands) and cylinders (for helices) (Figure 2b). There were considerable publications that adopted topology diagrams to illustrate protein structural configurations [40–42], and several software tools were also available which take atomic coordinate data input and produce protein topology diagrams in different styles [43, 44]. The key notation describing topology diagrams were first introduced by Richardson [45] for -sheet topologies and further developed by Koch, Flower and Grigoriev et al. [46–48]. The topology diagrams are not only competent for clearly expressing simple protein motifs, such as the Greek-key and the Jelly-roll barrel motifs [49], but also performing well for showing complex protein global structures. Protein structure annotation The aim of structure annotations is to present the elusive structure-related information of proteins in a straightforward diagram, in qualitative or quantitative manner. Despite numerous protein diagrams were drawn in publications by manual approaches or using computer graphic software, the methods and programs used to automatically generate annotating diagrams for proteins were also in lack, relative to its extensive applications. A normal style for protein structure annotations is the linear representation which unfolds the structured proteins into a linear state, and then uses different patterns and colors to annotate structural implications. One of the mostly used aspects of linear representation is to show the arrangement of secondary structures in protein configuration. This is frequently adopted by database pages and publications [52]. Another important aspect of linear representation is to display structural features from sequence alignment [53, 54]. In addition, linear representation has many other applications in information presentation of protein structures at sequence level, such as protein hydrophilicity plot [55], flexibility plot [56] and structure validation plot [57] (GCG [58] and PSAweb server [59] are two useful tools to draw these plots based on protein sequences and structures). Here we present three interesting applications of linear representation: (i) DOMPLOT [60] is a standalone program used to generate line–box diagrams of protein chains in terms of their structural domains. This program is written in C and thus can be implemented efficiently. (ii) Sipos et al. [61] developed a rapid visual approach, pairwise repeat homology diagram, to visualize the protein sequence repeats detected by a profile HMM in a linear pair of sequences and to highlight their homology relations inferred by a phylogenetic tree. A Perl-script t2prhd was also provided by them for drawing the diagrams. (iii) Kolodny and Honig [62] proposed a method called VISTAL that mapped the C-spatial distances between two superposed proteins into a 2D sequence alignment which described the superposed structures as a series of matched secondary structure elements, colored according to the distance of their C atoms. 2D molecular graphics Besides the linear representation, protein structure annotation can also be performed using various fashions for different purposes. Wako and Scheraga [63] applied a distance constraint approach to 2D modeling of proteins in order to visualize the nature of protein folding and to examine the relative roles of different ranges of interaction. Campagne et al. [64–66] developed a web-based server RbDe that greatly simplifies the construction of schematic diagrams of proteins; the yielded residue-based diagrams displayed the sequence of a given protein in the context of its secondary and tertiary structure and highlighted the amino acid residues in different colors according to the physicochemical properties. Kannan et al. [67] proposed a logo-based strategy to express the information about neighbor composition densities for residues with different secondary structures, and this method was used to ascertain the reason why a particular sequence with a given amino-acid composition should be fold into a specific structural class. Furthermore, the solvent accessibility of amino acids in structured proteins is a critical property of proteins [68] and can be calculated by many approaches such as MSMS [69], NACCESS [70] and SAVOL3 [71]. However, there was only one tool called ASAView [72, 73] that can provide the schematic representation of accessible surface area for the independent amino acid residues in buried state, by using spiral plot or histogram. Another important application of protein structure annotation is the sequence logo which was originally described by Schneider and Stephens [74]. This method graphically represents the information present in a multiple sequence alignment. Each position in the logo is represented by a stack consisting of letters occurring at the corresponding position in the alignment. The height of each character is proportional to its frequency, where the most common character is placed at the top of the stack. The height of the entire stack indicates the information content available at that position. Currently, several tools for creating protein sequence logos are available via the web interfaces [75–77]. Sequence logo can also be used for DNA and RNA. Intra-protein nonbonding/bonding plot The global structure of proteins is stabilized by hydrophobic packing occurring in the cores, whereas the conformational specificity is accurately adjusted only by the forming or breaking of the bonding 251 (e.g. disulphide bridges) and nonbinding (e.g. hydrogen bonds) interactions between intra-protein elements. It is conceivable that the complicated interaction network of protein structures is difficult to be understood in the stereoviews. Therefore, several graph theory-based methods were proposed to visualize the disulphide bridges and hydrogen bond networks of proteins in the 2D pages. 2D visualization of hydrogen bond networks. HERA [78] is the first program for automatically plotting hydrogen bonding diagrams of protein structures. It can also be used to calculate the connectivity of -strands and to extract simple structural motifs such as hairpins or Greek keys. Another similar tool is HBNG [79] which gives an enumeration of favorable topologies of hydrogen bond networks in protein structures and determines the effect of cooperativity and anticooperativity on protein stability and folding. 2D visualization of disulphide bridges. Protein disulphide bridge topology has been investigated intensively [80–83] and is usually schematically represented by topology pattern plot. The topological stereoisomerism of multiple-disulfide polypeptide chain in stable folded proteins was first characterized from graph theoretic analysis of its covalent structure by Mao [84]. Based on that, Benham and Jafri [85] made a systematical study on disulfide bonding patterns and defined a simple linear graph to illustrate the linking patterns of disulfide bridges in protein sequences. The similar way was also used to analyze and visualize the topological stereochemistry of protein hydrogen bonds [86]. Protein surface map The original work of projecting biological surface onto a 2D plane was made by Rossmann and Palmenberg [87] who used the ‘roadmaps’ to depict the residue distribution on the capsids of human rhinovirus and mengovirus. Chapman [88] extended the application scope of ‘roadmaps’ to the external surfaces of proteins and gave a clear definition of the boundaries between labeled residues in the maps. Recently, researcher’s interest has been transferred from protein global surfaces to some concerned local regions. Stahl et al. [89] proposed a computer-based strategy for automatically mapping protein surface cavities. Using this approach they successfully examined 176 metalloproteinases containing zinc cations in the protein active sites. Lee and Tzou [90] 252 Zhou and Shang described an earth-map representation of the directionality of protein surfaces in the DNA-binding regions. Gabdoulline et al. [91] developed a tool MolSurfer for analyzing, characterizing and visualizing the physicochemical properties of protein– protein-binding interfaces in 2D pages. Transmembrane protein display Transmembrane proteins play many important and diverse functional roles in cells and biological processes [92]. A number of methods and tools were reported for producing 2D transmembrane diagrams from protein primary sequences or advanced structures, such as the above mentioned RbDe [93] which can also be used to generate residue-based schematic diagram of transmembrane proteins. Moreover, the G protein-coupled receptor database (GPCRDB) [94] provides a 2D visualization tool Viseur [95] to display snake like diagram for the queried GPCRs. VHMPT [96] is a transmembrane topology plotting program that allows users to modify the layout of the generated topology, to label specific amino acid or amino-acid groups, and to annotate with arrows and texts. TMRPres2D [97] automates the generation of uniform, high analysis 2D graphical images of -helical or -barrel transmembrane proteins. TOPO2 [98] is a simple graphics program using a sequence along with input provided by the user to produce an image of a transmembrane protein. TEXtopo [99] creates the transmembrane topologies from PHD predictions [100] or SwissProt database files [101], and the output diagrams use shades and rich decorations to emphasize conserved residues or functional properties of the residue side-chains. Representation and annotation of nucleic acids 2D representation of DNA The mostly used 2D representation of DNA is, just like proteins, the linear representation of which the main applications are concentrated on sequence characteristics display and alignment plotting, these functions are available in many software suites such as BioViews [102], viewGene [103], SeqVISTA [104], etc. DNA sequence logo is also a popular fashion for annotating nucleotide’s conservation or other properties [105]. For example, the Sequencewalker [106] is a special tool adopting logo style to display how binding proteins and other macromolecules interact with individual bases of a given nucleotide sequences. Another important aspect of DNA 2D representation is the random walk plot in which the four nucleotides A/T and C/G are served as movers indicating respectively left/right and up/down moves, a DNA sequence can thus ‘walk’ on a 2D plane, from 50 to 30 , to form a polyline plot [107–109]. However, the main uses of DNA walk plot are in sequence characterization [110] and analysis [111], rather than the 2D visualization. Moreover, there has been a simple but interesting instance of visualizing complete genomes, called DNA rainbow [112]. This method is a beautiful mix of science and art, which assigns a color to every of the four bases in a chromosome and renders it as a multicoloured picture. In a previous review, Roy et al. [113] have given a detail description of the DNA graphical representation. 2D representation of RNA The focus of RNA 2D representation is how to generate non-overlapping arrangement and layout of RNA secondary structure elements under an aesthetically pleasing consideration. In a mathematical sense, the secondary structure of RNA is a topological structure rather than a geometric structure. It remains unchanged even when under distortion, as long as the connectivity relation between nucleotides is not changed. Han et al. [114] have classified the RNA 2D representation methods into conventional and unconventional representations. The published works on 2D representation of RNA are listed in Table S1 in Supplementary Data. Conventional representation of RNA. The conventional representation illustrates a polygonal display of RNA global structure in which loops are drawn as regular polygons, thereby generating clear and compact representation, especially for the long RNAs. In the early stage, Lapalme et al. [115] presented a simple algorithm for short RNA segments. Shapiro et al. [116] later developed a method to circumvent the problem of overlapping portions of the RNA. Bruccoleri and Heinrich [117] described an improved algorithm that is particularly suitable for drawing large RNA molecules with very limited overlap of strands. Gautheret et al. [118] gave a brief review on the early studies of RNA display. In the 1990s, RNA 2D representation obtained a rapid development in both methodology and software. These works laid emphasis on different aspects ranging from the generation of overlap-free polygonal displays [119] 2D molecular graphics 253 and richly annotated diagrams [120] to development of general graphic libraries [121] and efficient computer algorithms [122]. After 2000, RNA conventional representation has gradually grown to be mature. Recently, it trends to developing combinatorial application suites that assemble a number of modules with diverse functions to automatically implement the complete procedure of RNA representation, including sequence alignment, energy optimization, secondary prediction, and so on. Unconventional representation of RNA. The unconventional representation pays attention to local characteristics of RNA structures (e.g. loops, domes, pseudoknots, base pairs, etc.) or uses special fashions to define RNA global structures. Compared to conventional representation, methods and tools for unconventional representation are relatively few but developing rapidly in recent years. Martinez [123] early proposed a multiple approach for identifying and depicting local structures in the form of RNA hairpins. After that, Han et al. [124] presented an algorithm drawing H-type pseudoknots based on RNA graph theory, and this method was further improved in several aspects including pseudoknot types, display quality and computational efficiency [125]. Another important tool in the RNA unconventional representation area is the RNAView [126], which creates a schematically diagram for all base pairs in a RNA molecule using previously developed theory and symbols by the same group [127, 128]. RNAView was further modified to Rna2DViewer and integrated into the S2S package [129]. Like the RNAView, MC-Annotate [130] can also be used to annotate information about RNA local structures. The output of this program is composed of a 2D structural graph that contains the annotations, and a series of HTML documents, one for each nucleotide conformation and base–base interaction present in the input 3D structure. Moreover, the tool of jViz.Rna developed by Wiese et al. [131] is a Java program that represents RNA global structures using various strange patterns, such as linked graph, circle graph, dot plot, etc. In this literature they also gave a brief description of some useful RNA visualization tools. enzyme–substrate, receptor–ligand, target–drug, etc. Knowledge of these interactions is important to understanding how proteins function in biological systems and to developing drugs with specific selectivity. 2D representation of protein–small molecule interactions dedicates to straightforwardly demonstrating the hidden active sites and the complicated nonbonding networks across the binding interfaces. However, only three tools are available that can be used to automatically generate schematic 2D diagrams of protein–small molecule interactions. LIGPLOT is a well-known program written by Wallace et al. [132]. It employs HBPLUS [133] and NACCESS [70] to identify and compute hydrogen bonds and solvent accessible areas around the ligand molecule, and subsequently arranging these information into a neat page containing a complete description of the nonbinding effects on the given complex’s interface (Figure 3a). LIGPLOT adopts a series of algorithms to avoid overlapping and to flatten ring groups, and provides diverse output styles that can be easily changed by users. In the MOE platform, Clark and Labute [134] employed scientific vector language (SVL) to fulfill the procedure of automatic depiction of protein– ligand complexes. The output diagrams are annotated with a substitution contour, solvent exposure, cofacters, covalently bound linkages and a series of identified nonbonding interactions and conserved residues (Figure 3b). This method can also be applied to aligned sets which contain multiple ligands, or multiple members of a protein family, in which case the ligand orientations and protein residue placement will show consistent trends throughout the series. Stierand et al. [135, 136] described an algorithm that is based on a combinatorial optimization strategy which can solve parts of the layout overlapping problem nonheuristically. The depicted molecules are represented as structure diagrams according to their chemical nomenclatures and arranged to maximize aesthetic ideals (Figure 3c). This method is called PoseView and available as a web server by which the complete protocol of layout procedure can be implemented efficiently and conveniently. Schematic layout of molecular interactions Protein^nucleic acid interaction Today, only a single software tool for the automatic creation of protein–nucleic acid complex diagrams called NUCPLOT [138] is available. It is a standalone program that can automatically identify Protein^small molecule interaction Protein–small molecule interactions are accompanied with the binding and association processes of 254 Zhou and Shang Figure 3: 2D representation of the cyclin-dependent kinase 2 (CDK2)-oxindole inhibitor complex (PDB entry: 1fvt) (below).These diagrams were generated using (a) LIGPLOT [132], (b) MOE [134] and (c) PoseView [135].The stereoview of the complex (above) was produced using CHIMERA [137]. interfacial residues and interactions between them from the input PDB coordinates of the complex, and generates a plot in which all interaction information are intuitively shown in a schematic manner. The program creates the output files consisting of nucleic acid chains, interfacial elements, hydrogen bonds, van der Waal contacts and covalent bonds between the protein and the nucleic acid. The resulting diagram is both clear and simple and allows immediate identification of important interactions within the structure. The program works well for both doublestranded and single-stranded nucleic acids binding with proteins, so it can be used for both cases of protein–DNA complexes and partial protein–RNA complexes. Protein^protein interaction The ready availability of structural data of protein complexes, both from experimental determination, such as by X-ray crystallography, and by theoretical modeling, such as protein docking, has made it necessary to find ways to easily interpret the results at 2D level. The simplest way used for 2D representation of protein–protein interactions is the linking plot in which the protein sequences are represented as two lines, each from one protein subunit of the complex, and the residue-pairs in interactions are linked linearly. Several methods like MONSTER [139] and iPfam [140] can be used to automatically generate linking plots from protein complex structures. The main disadvantage of linking plot is that the presenting information is too cursory to give a comprehensive insight into the protein-binding interfaces. For that, QContacts [141] adopts a different strategy called contacting map to exhibit nonbonding interactions in the binding interfaces. This map can be viewed as a matrix; residues in turn from one subunit sequence define the columns, and those from another define the rows, the matrix elements thus represent the corresponding residue-pairs and, if in interaction, labeled by different symbols indicating different interaction types. A more aesthetically pleasing, informative diagram showing protein–protein interactions can be generated by using DIMPLOT [132], which is a sister program of above-mentioned LIGPLOT and thus its function and output style are same to that of LIGPLOT. Recently, our group developed a graphical user interface (GUI)-based package 2D-GraLab [142] for 2D molecular graphics automatically generating comprehensive representation of diverse nonbonding (and covalent) interactions across the protein-binding interfaces, including hydrogen bonds, salt bridges, van der Waals interactions, hydrophobic contacts, – stackings, disulfide bonds, desolvation effect and loss of conformational entropy. This program takes standard PDB format as the input file and gives two types of PostScript outputs, referring to individual schematic diagram (for each nonbinding type) and summarized schematic diagram (for all nonbinding types). In addition, 2D-GraLab provides a complete protocol to define and calculate geometric and energetics parameters for different nonbonding types, such as length, angle, free energy, stability, etc., and subsequently presents these parameters with interfacial structure elements in an elaborately designed page. To ensure these interactions are determined accurately and reliably, methods and standalone programs employed in 2D-GraLab are all widely used in the chemistry and biology communities. CONCLUSIONS 2D molecular graphics is the subject that uses an intuitive, understandable and accurate manner to present configuration and information of chemical and biological systems in planar pages. It should be always noted that the two points: (i) The data and parameters used by 2D molecular graphics are all taken from 3D structures and models, and further mapped into 2D representations by a consensus algorithm. Therefore, the diagrams of 2D molecular graphics are different from those of inexact drafted pictures that are used just for schematic demonstration. (ii) 2D molecular graphics lays emphasis on graphic representation, rather than being severed as calculation tool. Therefore, those studies using 2D graph and geometry just for the purposes of mathematical tool or modeling approach, such as chemical graph theory and DNA topological theory, do not belong to the spectrum of 2D molecular graphics. Key Points Propose the term of 2D molecular graphics. Review methods, tools and applications of 2D molecular graphics. Define the difference between the 2D molecular graphics and other similar subjects. 255 Acknowledgements The authors would like to express their gratitude to the editors and the anonymous reviewers for their professional, intensive and useful comments. They also thank the people for providing access of pictures, programs, servers and databases used in this article. SUPPLEMENTARY DATA Supplementary data are available online at http:// bib.oxfordjournals.org/. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Ernesto Z. TheTheory of Graphs in Linguistics. The Hague: Mouton Press, 1970. Goodsell DS. Visual methods from atoms to cells. Structure 2005;13:347–54. Leach AR. Molecular Modelling: Principles and Applications (2nd Ed). UK: Pearson Education EMA, 2001. Bensaude-Vincent B, Stengers I. A History of Chemistry. Cambridge: Harvard University Press, 1997. ChemDraw. CambridgeSoft Corp., MA, USA. Symyx/Draw. Symyx Technologies, Inc., CA, USA. Ogham. OpenEye Scientific Software, Inc., NM, USA. Ihlenfeldt WD, Takahashi Y, Abe H, et al. Computation and management of chemical-properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci 1994;34:109–16. Clark AM, Labute P, Santavy M. 2D structure depiction. J Chem Inf Model 2006;46:1107–23. Zimmerman BL. Computer-Generated Chemical Structural Formulas with Standard Ring Orientations, PhD Dissertation, University of Pennsylvania, 1971. Wipke WT, Heller S, Feldmann R, et al. Computer Representation and Manipulation of Chemical Information. New York: Wiley, 1974. Carhart RE. A model-based approach to the teletype printing of chemical structures.. J Chem Inf Comput Sci 1976; 16:82–8. Dittmar PG, Mockus J, Couvreur KM. An algorithmic computer graphics program for generating chemical structure diagrams. J Chem Inf Comput Sci 1977;17:186–92. Blake JE, Farmer NA, Haines RC. An interactive computer graphics system for processing chemical structure diagrams. J Chem Inf Comput Sci 1977;17:223–8. Shelley CA. Heuristic approach for displaying chemical structures. J Chem Inf Comput Sci 1983;23:61–5. Hibbert DB. Generation and display of chemical structures by genetic algorithms. Chemometr Intel Lab Syst 1993;20: 35–43. Weininger D. SMILES. 3. DEPICT. Graphical depiction of chemical structures. J Chem Inf Comput Sci 1990;30:237–43. Helson HE. Structure diagram generation. Rev Comput Chem 1999;13:313–98. Boissonnat JD, Cazals F, Flötotto J. 2D-structure drawings of similar molecules. Proc 8th Int Symp Graph Draw 2000; 115–126. 256 Zhou and Shang 20. Fricker PC, Gastreich M, Rarey M. Automated drawing of structural molecular formulas under constraints. J Chem Inf Comput Sci 2004;44:1065–78. 21. Maehr H. Graphic representation of configuration in two-dimensional space. current conventions, clarifications, and proposed extensions. J Chem Inf Comput Sci 2002;42: 894–902. 22. Schüttelkopf AW, van Aalten DMF. PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Cryst 2004;D60:1355–63. 23. Krause S, Willighagen E, Steinbeck C. JChemPaint, using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Molecules 2000;5:93–8. 24. Trepalin SV, Yarkov AV, Pletnev IV, et al. A Java chemical structure editor supporting the modular chemical descriptor language (MCDL). Molecules 2006;11:219–231. 25. Crippen GM. Topology of globular proteins. J Theor Biol 1974;45:327–38. 26. Crippen GM. Topology of globular proteins. II. JTheor Biol 1975;51:495–500. 27. Gilbert D, Westhead D, Nagano N, et al. Motif-based searching in TOPS protein topology databases. Bioinformatics 1999;15:317–26. 28. Martin AC. The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng 2000;13: 829–37. 29. Johannissen LO, Taylor WR. Protein fold comparison by the alignment of topological strings. Protein Eng 2003;12: 949–55. 30. Schulz GE, Schirmer RH. Topological comparison of adenyl kinase with other proteins. Nature 1974;250: 142–4. 31. Rossmann MG, Moras D, Olsen KW. Chemical and biological evolution of a nucleotide-binding protein. Nature 1974;250:194–9. 32. Levitt M, Chothia C. Structural patterns in globular proteins. Nature 1976;261:552–7. 33. Naganoa K. Logical analysis of the mechanism of protein folding. IV. Super-secondary structures. J Mol Biol 1977;109: 235–50. 34. Sternberga MJE, Thorntona JM. On the conformation of proteins: the handedness of the connection between parallel -strands. J Mol Biol 1977;110:269–83. 35. Flores TP, Moss DS, Thornton JM. An algorithm for automatically generating protein topology cartoons. Protein Eng 1994;7:31–7. 36. Westhead DR, Hatton DC, Thornton JM. An atlas of protein topology cartoons available over the world wide web. Trends Biochem Sci 1998;23:35–6. 37. Westhead DR, Slidel TW, Flores TP, et al. Protein structural topology: automated analysis and diagrammatic representation. Protein Sci 1999;8:897–904. 38. Michalopoulos I, Torrance GM, Gilbert DR, et al. TOPS: an enhanced database of protein structural topology. Nucleic Acids Res 2004;32:D251–4. 39. May P, Barthel S, Koch I. PTGL—a web-based database application for protein topologies. Bioinformatics 2004;20: 3277–9. 40. Jayaram H, Taraporewala Z, Patton JT, et al. Rotavirus protein involved in genome replication and packaging exhibits a HIT-like fold. Nature 2002;417:311–5. 41. Bewley MC, Graziano V, Jiang J, et al. Structures of wild-type and mutant human spermidine/spermine N1-acetyltransferase, a potential therapeutic drug target. Proc Natl Acad Sci USA 2006;103:2063–8. 42. He X, Zhou J, Bartlam M, et al. Crystal structure of the polymerase PAC–PB1N complex from an avian influenza H5N1 virus. Nature 2008;454:1123–6. 43. Collaborative Computational Project. The CCP4 suite: programs for protein crystallography. Acta Cryst 1994;D50: 760–3. 44. Bond CS. TopDraw: a sketchpad for protein structure topology cartoons. Bioinformatics 2003;19:311–2. 45. Richardson JS. -sheet topology and the relatedness of proteins. Nature 1977;268:495–500. 46. Koch I, Kaden F, Selbig J. Analysis of protein sheet topologies by graph-theoretical methods. Proteins 1992;12: 314–23. 47. Flower DR. -sheet topology. A new system of nomenclature. FEBS Lett 1994;344:247–50. 48. Grigoriev I, Mironov A, Rakhmaninova A. Inter-helical contacts determining the architecture of -helical globular proteins. J Biomol Struct Dyn 1994;12:559–72. 49. Zhang C, Kim S-H. A comprehensive analysis of the Greek key motifs in protein -barrels and -sandwiches 2000;40: 409–19. 50. DeLano WL (2002), The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA. 51. Laskowski RA, Hutchinson EG, Michie AD, et al. PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci 1997;22: 488–90. 52. Luscombe NM, Laskowski RA, Westhead DR, et al. New tools and resources for analysing protein structures and their interactions. Acta Cryst 1998;D54:1132–8. 53. Delamarche C. Color and graphic display (CGD): Programs for multiple Sequence alignment analysis in spreadsheet software. BioTechniques 2000;29:100–7. 54. Li W, Godzik A. VISSA: A program to visualize structural features from structure sequence alignment. Bioinformatics 2006;22:887–8. 55. Kyte J, Doolittle RF. A simple method for displaying the hydrophathic character of a protein. J Mol Biol 1982;157: 105–32. 56. Ragone R, Facchiano F, Facchiano A, et al. Flexibility plot of proteins. Protein Eng 1989;2:497–504. 57. Laskowski RA, MacArthur MW, Moss DS, et al. PROCHECK—a program to check the stereochemical quality of protein structures. J Appl Crystallogr 1993;26: 283–91. 58. Genetics Computer Group, Science Drive, Madison, WI, USA. 59. Raghava GPS. A graphical web server for the analysis of protein sequences and alignment. Biotech Software Internet Rep 2001;2:255–8. 60. Todd AE, Orengo CA, Thornton JM. DOMPLOT: a program to generate schematic diagrams of the structural domain organization within proteins, annotated by ligand contacts. Protein Eng 1999;12:375–9. 61. Sipos B, Somogyi K, Andó I, et al. t2prhd: a tool to study the patterns of repeat evolution. BMC Bioinformatics 2008;9:27. 2D molecular graphics 62. Kolodny R, Honig B. VISTAL—a new 2D visualization tool of protein 3D structural alignments. Bioinformatics 2006; 22:2166–7. 63. Wako H, Scheraga HA. Visualization of the nature of protein folding by a study of a distance constraint approach in two-dimensional models. Biopolymers 1982;21:611–32. 64. Konvicka K, Campagne F, Weinstein H. Interactive construction of residue-based diagrams of proteins: the RbDe web service. Protein Eng 2000;13:395–6. 65. Campagne F, Bettler E, Vriend G, et al. Batch mode generation of residue-based diagrams of proteins. Bioinformatics 2003;19:1854–5. 66. Skrabanek L, Campagne F, Weinstein H. Building protein diagrams on the web with the residue-based diagram editor RbDe. Nucleic Acids Res 2003;31:3856–8. 67. Kannan N, Schneider TD, Vishveshwara S. Logos for amino-acid preferences in different backbone packing density regions of protein structural classes. Acta Cryst 2000; D56:1156–65. 68. Connolly ML. Solvent-accessible surfaces of proteins and nucleic-acids. Science 1983;221:709–13. 69. Sanner MF, Olson AJ, Spehner JC. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 1996;38:305–20. 70. Hubbard SJ, Thornton, JM. NACCESS. Department of Biochemistry and Molecular Biology, University College London, UK. 71. Pearlman RS, Skell JM. SAVOL3. College of Pharmacy, University of Texas, Austin, TX, USA. 72. Ahmad S, Wang J-Y, Gromiha MM, et al. Development of residue look-up tables and graphical representation of solvent accessibility in proteins. Genome Informatics 2003;14: 482–3. 73. Ahmad S, Gromiha M, Fawareh H, et al. ASAView: database and tool for solvent accessibility representation in proteins. BMC Bioinformatics 2004;5:51. 74. Schneider TD, Stephens MR. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990;18: 6097–100. 75. Henikoff S, Henikoff JG, Alford WJ, et al. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 1995;163:GC17–26. 76. Crooks GE, Hon G, Chandonia J-M, et al. WebLogo: a sequence logo generator. Genome Res 2004;14:1188–90. 77. Pérez-Bercoff Å, Koch J, Bürglin TR. LogoBar: bar graph visualization of protein logos with gaps. Bioinformatics 2006; 22:112–4. 78. Hutchinson EG, Thornton JM. HERA—a program to draw schematic diagrams of protein secondary structures. Proteins 1990;8:203–12. 79. Tiwari A, Tiwari V. HBNG: graph theory based visualization of hydrogen bond networks in protein structures. Bioinformation 2007;2:28–30. 80. Kikuchi T, Nemethy G, Scheraga H. Spatial geometric arrangements of disulfide-crosslinked loops in proteins. J Comput Chem 1986;7:61–88. 81. Kikuchi T, Nemethy G, Scheraga H. Spatial geometric arrangements of disulfide-crosslinked loops in non-planar proteins. J Comput Chem 1989;10:287–94. 82. Boisbouvier J, Blackledge M, Sollier A, et al. Simultaneous determination of disulphide bridge topology and 257 three-dimensional structure using ambiguous intersulphur distance restraints: Possibilities and limitations. J Biomol NMR 2000;16:197–208. 83. Li X, Zou G, Yuan W, et al. Defining the native disulfide topology in the somatomedin B domain of human vitronectin. J Biol Chem 2007;282:5318–26. 84. Mao B. Molecular topology of multiple-disulfide polypeptide chains. J Am Chem Soc 1989;111:6132–6. 85. Benham CJ, Jafri MS. Disulfide bonding patterns and protein topologies. Protein Sci 1993;2:41–54. 86. Mao B, Chou K, Maggiora G. Topological analysis of hydrogen bonding in protein structure. EurJ Biochem 1990; 188:361–5. 87. Rossmann MG, Palmenberg AC. Conservation of the putative receptor attachment site in picornaviruses. Virology 1988;164:373–82. 88. Chapman MS. Mapping the surface properties of macromolecules. Protein Sci 1993;2:459–69. 89. Stahl M, Taroni C, Schneider G. Mapping of protein surface cavities and prediction of enzyme class by a selforganizing neural network. Protein Eng 2000;13:83–8. 90. Lee W-P, Tzou W-S. Molecular surface directionality of the DNA-binding protein surface on the earth map. Genet Mol Biol 2006;29:408–12. 91. Gabdoulline RR, Wade RC, Walther D. MolSurfer: a macromolecular interface navigator. Nucleic Acids Res 2003; 31:3349–51. 92. Byrne B, Iwata S. Membrane protein complexes. Curr Opin Struct Biol 2002;12:239–43. 93. Campagne F, Weinstein H. Schematic representation of residue-based protein context-dependent data: an application to transmembrane proteins. J Mol Graph Model 1999;17: 207–13. 94. Horn F, Weare J, Beukers MW, et al. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 1998;26:275–9. 95. Campagne F, Jestin R, Reversat JL, et al. Visualisation and integration of G protein-coupled receptor related information help the modelling: description and applications of the Viseur program. J Comput Aid Mol Des 1999;13: 625–43. 96. Lin W-J, Hwang M-J. VHMPT: a graphical viewer and editor for helical membrane protein topologies. Bioinformatics 1998;14:866–8. 97. Spyropoulos IC, Liakopoulos TD, Bagos PG, et al. TMRPres2D: high quality visual representation of transmembrane protein models. Bioinformatics 2004;20: 3258–60. 98. Johns SJ. TOPO2. University of California. San Francisco, USA. 99. Beitz E. TEXtopo: shaded membrane protein topology plots in LATEX2". Bioinformatics 2000;16:1050–1. 100. Rost B. PHD: prediction one-dimensional protein structure by profile based neural networks. Meth Enzymol 1996;266:525–39. 101. Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res 1997;25:31–6. 102. Helt GA, Lewis S, Loraine AE, et al. BioViews: Java-based tools for genomic data visualization. Genome Res 1998;8: 291–305. 258 Zhou and Shang 103. Kashuk C, SenGupta S, Eichler E, et al. viewGene: a graphical tool for polymorphism visualization and characterization. Genome Res 2002;12:333–8. 104. Hu Z, Frith M, Niu T, et al. SeqVISTA: a graphical tool for sequence feature visualization and comparison. BMC Bioinformatics 2003;4:1. 105. Workman CT, Yin Y, Corcoran DL, et al. enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res 2005;33:W389–92. 106. Schneider TD. Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences. Nucleic Acids Res 1997;25:4408–15. 107. Leong PM, Morgenthaler S. Random walk and gap plots of DNA sequences. Comput Appl Biosci 1995;11:503–7. 108. Berger JA, Mitra SK, Carli M, et al. Visualization and analysis of DNA sequences using DNA walks. J Franklin Institute 2004;341:37–53. 109. Kim MA, Lee EJ, Cho HG, et al. A visualization technique for DNA walk plot using -convex hull. Proc Fifth Int Conf Central Europe Comput GraphVisualization 1997;10–14. 110. Randić R, Vračko M, Nandy A, et al. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J Chem Inf Comput Sci 2000;40: 1235–44. 111. Thomas JM, Horspool D, Brown G, et al. GraphDNA: a Java program for graphical display of DNA composition analyses. BMC Bioinformatics 2007;8:21. 112. Bierkandt K, Bierkandt J. DNA rainbow. [http:// www.dna-rainbow.org] (21 March 2008, date last accessed). 113. Roy A, Raychaudhury C, Nandy A. Novel techniques of graphical representation and analysis of DNA sequences— a review. J Biosci 1998;23:55–65. 114. Han K, Kim D, Kim H-J. A vector-based method for drawing RNA secondary structure. Bioinformatics 1999;15: 286–97. 115. Lapalme G, Cedergren RJ, Sankoff D. An algorithm for the display of nucleic acid secondary structure. Nucleic Acids Res 1982;20:8351–6. 116. Shapiro BA, Maizel J, Lipkin LE, et al. Generating nonoverlapping displays of nucleic acid secondary structure. Nucleic Acids Res 1984;12:75–88. 117. Bruccoleri RE, Heinrich G. An improved algorithm for nucleic acid secondary structure display. Comput Appl Biosci 1988;4:167–73. 118. Gautheret D, Major F, Cedergren R. Computer modeling and display of RNA secondary and tertiary structures. Methods Enzymol 1990;183:318–30. 119. Perochon-Dorisse J, Chetouani F, Aurel A, et al. RNA_d2: a computer program for editing and display of RNA secondary structures. Comput Appl Biosci 1995;11:101–9. 120. Chetouani F, Monestié P, Thébault P, et al. ESSA: an integrated and interactive computer tool for analysing RNA secondary structure. Nucleic Acids Res 1997;25:3514–22. 121. Muller G, Gaspin C, Etienne A, et al. Automatic display of RNA secondary structures. Comput Appl Biosci 1993;9: 551–61. 122. Auber D, Delest M, Domenger JP, et al. Efficient drawing of RNA secondary structure. J Graph Algorithms Appl 2006; 10:329–51. 123. Martinez HM. An RNA secondary structure workbench. Nucleic Acids Res 1988;16:1789–98. 124. Han K, Lee Y, Kim W. PseudoViewer: automatic visualization of RNA pseudoknots. Bioinformatics 2002;18: S321–8. 125. Han K, Byun Y. PseudoViewer2: visualization of RNA pseudoknots of any type. Nucleic Acids Res 2003;31: 3432–40. 126. Yang H, Jossinet F, Leontis N, et al. Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res 2003;31:3450–60. 127. Leontis N, Westhof E. Geometric nomenclature and classification of RNA base pairs. RNA 2001;7:499–512. 128. Leontis N, Westhof E. The annotation of RNA motifs. Comp Funct Genom 2002;3:518–24. 129. Jossinet F, Westhof E. Sequence to structure (S2S): display, manipulate and interconnect RNA data from sequence to structure. Bioinformatics 2005;21:3320–1. 130. Gendron P, Lemieux S, Major F. Quantitative analysis of nucleic acid three-dimensional structures. J Mol Biol 2001; 308:919–36. 131. Wiese KC, Glen E, Vasudevan A. jViz.Rna—a Java tool for RNA secondary structure visualization. IEEE Trans Nanobiosci 2005;4:212–8. 132. Wallace AC, Laskowski RA, Thornton JM. LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 1995;8:127–34. 133. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994;238:777–93. 134. Clark AM, Labute P. 2D depiction of protein-ligand complexes. J Chem Inf Model 2007;47:1933–44. 135. Stierand K, Maaß PC, Rarey M. Molecular complexes at a glance: automated generation of two-dimensional complex diagrams. Bioinformatics 2006;22:1710–6. 136. Stierand K, Rarey M. From modeling to medicinal chemistry: automatic generation of two-dimensional complex diagrams. ChemMedChem 2007;2:853–60. 137. Pettersen EF, Goddard TD, Huang CC, et al. UCSF Chimera – A visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–12. 138. Luscombe NM, Laskowski RA, Thornton JM. NUCPLOT: a program to generate schematic diagrams of protein–nucleic acid interactions. Nucleic Acids Res 1997;25: 4940–5. 139. Salerno WJ, Seaver SM, Armstrong BR, et al. MONSTER: inferring non-covalent interactions in macromolecular structures from atomic coordinate data. Nucleic Acids Res 2004;32:W566–8. 140. Finn RD, Marshall M, Bateman A. iPfam: visualization of protein–protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005;21:410–2. 141. Fischer TB, Holmes JB, Miller IR, et al. Assessing methods for identifying pair-wise atomic contacts across binding interfaces. J Struct Biol 2006;153:103–12. 142. Zhou P, Tian F, Shang Z. 2D depiction of nonbonding interactions for protein complexes. J Comput Chem, 2009;30:940–51.
© Copyright 2025 Paperzz