Special Focus Issue – Databases KNApSAcK-3D: A Three-Dimensional Structure Database of Plant Metabolites Kensuke Nakamura1,*, Naoki Shimura1, Yuuki Otabe1, Aki Hirai-Morita2, Yukiko Nakamura2, Naoaki Ono2, Md Altaf Ul-Amin2 and Shigehiko Kanaya2,* 1 Department of Life Science and Informatics, Maebashi Institute of Technology, 460-1 Kamisadori-machi, Maebashi-City, Gunma, 371-0816 Japan 2 Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0192 Japan *Corresponding authors: Kensuke Nakamura, E-mail, [email protected] or [email protected]; Shigehiko Kanaya, E-mail, [email protected] (Received October 31, 2012; Accepted December 13, 2012) Studies on plant metabolites have attracted significant attention in recent years. Over the past 8 years, we have constructed a unique metabolite database, called KNApSAcK, that contains information on the relationships between metabolites and their expressing organism(s). In the present paper, we introduce KNApSAcK-3D, which contains the three-dimensional (3D) structures of all of the metabolic compounds included in the original KNApSAcK database. The 3D structure for each compound was optimized using the Merck Molecular Force Field (MMFF94), and a multiobjective genetic algorithm was used to search extensively for possible conformations and locate the global minimum. The resulting set of structures may be used for docking studies to identify new and potentially unexpected binding sites for target proteins. The 3D structures may also be utilized for more qualitative studies, such as the estimation of biological activities using 3D-QSAR. The database can be accessed via a link from the KNApSAcK Family website (http://kanaya.naist.jp/KNApSAcK_Family/) or directory at http://kanaya.naist.jp/knapsack3d/. Keywords: Database KNApSAcK Species-metabolite relationship Force field Conformation analysis 3D Pharmacophore. Abbreviations: DB, database; 3D-QSAR, three-dimensional quality structure–activity relationship; MDL, Molecular Design Limited; MMFF, Merck Molecular Force Field; MS, mass spectrometry; SBDD, structure-based drug design. Introduction Life systems can be thought of as enormous flows of interacting, interconverting chemical substances of various kinds. While biopolymers (e.g. proteins and nucleic acids) play primary active roles in this phenomenon as enzymes and their controlling factors, the actual players are chemical substances called metabolites. In general, metabolites are categorized into two groups: primary and secondary metabolites. Primary metabolites are involved in common pathways governing growth, development and reproduction, and are shared among most (or at least a considerable proportion of) living organisms. Secondary metabolites, on the other hand, are restricted to much smaller groups of organisms or even to single organisms. Secondary metabolites, which are often called natural products, are typically derived from primary metabolites, have extremely diversified structures, and are associated with a wide variety of specialized biological functions (e.g. inter- or intraspecies communication; toxicity against potential predators; antifungal/antibiotic activities; and resistance to environmental factors, such as temperature, light, radiation and harmful substances such as active oxygen species and heavy metals). Most of these biological activities increase the chance of survival in the face of many life-threatening pressures encountered in the natural environment. Plants, fungi and microorganisms are rich sources of secondary metabolites with unique biological activities. The most convincing explanation for the abundance of secondary metabolites in these organisms is that while animals are able to change their locations to avoid predators or leave harsh environments, plants, fungi and microorganisms usually cannot. Thus, they have developed various chemical means of protecting themselves. For instance, many plants produce alkaloids that are toxic to most animals, and thereby avoid being eaten by these animals. However, a recent study found that some fruit-bearing plants lack several of the Cyt P450 gene subfamilies responsible for alkaloid production (Sato et al. 2012). This is an interesting example of a trade-off, wherein a plant might disarm itself because the seed dispersal advantage overcomes the disadvantage of being eaten. The importance of metabolite research in plant science is emphasized by the increasing number of studies that have been Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186, available online at www.pcp.oxfordjournals.org ! The Author 2013. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: [email protected] Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. 1 K. Nakamura et al. reported in recent years. For instance, there have been recent advances in the research on strigolactones and their roles in parasitic or symbiotic interactions with plants and arbuscular mycorrhizal fungi (Seto et al. 2012). Strigolactones have also been found to be involved in controlling the growth of tillers in rice (Luo et al. 2012). Seo et al. (2012) reported that two compounds from tobacco, sclareol and cis-abienol (KNApSAcK-3D IDs C00000894 and C00000875, respectively), can effectively inhibit bacterial wilt disease caused by Ralstonia solanacearum. Jeknic et al. (2012) reported the functional characterization of capsanthin-capsorubin synthase in tiger lily, which affects the color of the flower by controlling the production of capsanthin and capsorubin (KNApSAcK-3D IDs C00003763 and C00003764, respectively) (Jeknic et al. 2012). One of the interesting trends in metabolic research is to focus on the regulation of metabolite contents in response to environmental factors, such as light. Hirose et al. (2012) reported the reduction of active gibberellin contents in rice seedlings under light irradiation, while Mewis et al. (2012) reported changes in the secondary metabolite profile of broccoli sprouts in response to UV-B irradiation. Among the biological activities of secondary metabolites, those related to human diseases are particularly interesting from a practical point of view. These include antifungal, antibiotic, anti-inflammatory and antitumor activities, as well as those that influence the nervous system (i.e. anesthetics or stimulants). For almost a decade, we have been developing a database of plant secondary metabolites. This database, called KNApSAcK (Takahashi et al. 2011, Afendi et al. 2012), currently features information on 101,500 species–metabolite pairs encompassing 20,741 species and 50,048 metabolites. KNApSAcK is still under development, as we are constantly adding information on new metabolites, their biological functions and the geographical distributions of plant use in medicines and/or food (Okada et al. 2010). One of the most important applications of KNApSAcK to date is the assignment of mass spectrometry (MS) signals for metabolomic studies (Oishi et al. 2009). Several groups have developed methods and tools (e.g. initial structure generation and choice of potential energy functions) for generating three-dimensional (3D) structures for small organic molecules (Sadowski and Gasteiger 1994, Miteva et al. 2010), often for use in structure-based drug design (SBDD). One of the authors of the present paper was involved in developing such a tool (Yamada et al. 2006), and thus has a good sense of whether structures generated would be useful for SBDD and docking studies. In the present work, we employed the Balloon software (Vainio and Johnson 2007, Puranen et al. 2010) with the Merck Molecular Force Field (MMFF94) (Halgren 1996) to generate 3D structures for all 50,048 metabolites contained in the original KNApSAcK database. Here, we describe details of the KNApSAcK-3D database and discuss its potential use in discovering new therapeutic applications for natural products. 2 Appearance of KNApSAcK-3D The main page of the KNApSAcK-3D website (Fig. 1a) has a query window where the user can search KNApSAcK-3D by compound name, KNApSAcK ID or organism. The search result window (Fig. 1b) shows a list of compounds matching the query. When the user chooses a compound in the search result window, the link leads to the compound page (Fig. 1c), which gives the chemical formula, planar structure and other information associated with the molecule, as well as the 3D molecular structure display using a Jmol applet (Herraez 2006). The user may be required to install Java in order to view the 3D display. We have tested KNApSAcK-3D on several browsers including Firefox, Internet Explorer and Safari on Microsoft Windows 7 and Mac OSX. Although there is no particular system requirement, a faster graphics card may be desirable for smooth graphic performance. As the scientific (Latin) names of organisms may be unfamiliar to users, the compound page offers a link to the Yahoo search engine, so the user can easily obtain the organism’s common name. The compound page also includes an option for downloading 3D structure information in MDL’s mol format. Link from the KNApSAcK Core System A link to the KNApSAcK-3D main page is found on the main page of the KNApSAcK family, as shown in Fig. 2a. Instead of using the search system of the KNApSAcK-3D main page, users can employ the search capability of the KNApSAcK core system (Fig. 2b) to search the KNApSAcK database, and click the 3D button on the resulting compound page of the KNApSAcK core system (Fig. 2c) to obtain 3D structure information from the KNApSAcK-3D compound page (Fig. 1c). Automated 3D Structure Generation Using Balloon Planar chemical structures described in MDL’s mol format were converted by Balloon software (Vainio et al. 2007). Balloon version 1.0.3.772 was downloaded from the Balloon web site (http://users.abo.fi/mivainio/balloon/index.php) and used to convert planar structures into 3D structures as described in detail below. The flowchart of the overall procedure is illustrated in Fig. 3. Initially, we converted the planar chemical structure described in .mol file format (2d.mol) into 3D structures (3dx.mol) without attaching hydrogen atoms, using the following command/options. This process was necessary because Balloon cannot attach hydrogen atoms directly to the planar structure. For each compound, we chose the most stable conformation from among 20 MMFF-optimized structures generated by Balloon. Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. KNApSAcK-3D: a metabolite 3D structure database Fig. 1 The appearance of the KNApSAcK-3D web site. The main page (a) has a query window where users can search metabolites by compound name, KNApSAcK ID or organism (if metabolites are known). The hits are listed as search results (b), where the user can pick one of the matched compounds. The compound page (c) contains information on the selected compound, including the KNApSAcK ID (C_ID), compound name and chemical formula, and displays the planar and 3D structures of the compound. It also lists the organisms that contain the compound as a metabolite. Purple and red buttons to the right of each organism’s scientific name offer links to Yahoo search pages for the common names in English and Japanese, respectively. Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. 3 K. Nakamura et al. Fig. 2 The appearance of the KNApSAcK web site. The main page of the KNApSAcK family (a) has a link to the KNApSAcK-3D main page and the KNApSAcK core system (b). The compound page of the KNApSAcK core system (c) has a link to the corresponding compound page of KNApSAcK-3D. 4 Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. KNApSAcK-3D: a metabolite 3D structure database balloon -f MMFF94.mff –nconfs 20 –nGeneration 300 2d.mol 3dx.mol We chose the lowest energy conformation from the generated structures, using a Perl script, and designated it 3dxm.mol. Then we carried out a conformational search with hydrogen atoms, using the following command: balloon -f MMFF94.mff –nconfs 20 –nGeneration 300 3dxm.mol 3dH.mol Then, we once more chose the lowest energy structure from the generated structures, using a Perl script. The computational time largely depended on the size and complexity of the molecule. We converted 1,000 molecules per script, and the average computational time per script was approximately 4–8 h. We used a Mac Pro with 2.93 GHz Intel Xeon processors; since the memory requirement was relatively low, we could run several jobs at the same time. Manual Generation of 3D Structures Using Chem3D For structures that were difficult to convert using automated procedures, we manually created 3D structures mainly using the Chem3D software (Chem and Bio Draw Ultra 12.0; Cambridge Software Corporation). We prepared a low-energy conformation for each molecule using the following procedure. First, the structure provided by the Chem3D interface (obtained by simply opening the planar structure) was optimized using the MMFF94 force field. For relatively large molecules, the maximum iteration cycle for the optimization was set to 1,000 (from a default value of 500) to achieve the convergence. The resulting structure was visually inspected to confirm the stereochemistry and make sure no parts were tangled (this process occasionally yielded incorrect stereochemistry or a bad structure, such as a bond going through other rings). The latter problem was most often found in large molecules with many saccharide chains. The incorrect structures were manually fixed by moving the involved atoms and re-optimizing the structure with the MMFF94 force field. For each structure, we then carried out a molecular dynamics simulation for 2 ps (1,000 steps with 2 fs step interval), in order to remove steric constraints. Finally, a MMFF94 geometry optimization was carried out to obtain the final 3D structure of the molecule. When MMFF94 parameters were not available for a portion of the moiety, we used MM2 force field potentials (Allinger 1977) instead. This process automatically attached lone pair pseudo-atoms for heteroatoms. We removed these lone pairs before saving the final structure as a .mol file. Fig. 3 Procedure of automated 3D structure generation. Converted Structures Most of the compounds in the KNApSAcK database (49,709 out of 50,048; 99.3%) were successfully converted using Balloon. Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. 5 K. Nakamura et al. For the remaining 339 compounds (listed in the Supplementary data), we manually constructed 3D structures. The failures of our automated procedure can be categorized into two groups: (A) problems with structure generation, and (B) problems with Force Field. The molecules in the first category can be further divided into four groups: (A1) fused-ring systems with spiro/bicyclo rings; (A2) three-membered rings with double bonds; (A3) large molecules (>40 carbon atoms); and (A4) molecules with large (>7 members) rings. Category (B) includes chemical moieties that are not described in MMFF94, such as nitrogen oxide, ionized moieties or metal complexes. Some typical examples of these unconverted structures are shown in Fig. 4, including (A1) a bicyclic ring system (Fig. 4a); (A2) a three-membered ring with a double bond Fig. 4 Examples of structures that are not amenable to automated 3D structure generation: (a) gibberellin A2 (C00000098), (b) azirinomycin (C00018800), (c) phaseoloside D (C00003541), (d) citlalitrione (C00033725) and (e) makaluvamine J (C00028552). 6 Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. KNApSAcK-3D: a metabolite 3D structure database (Fig. 4b); (A3) a large steroid molecule associated with an oligosaccharide chain (Fig. 4c); (A4) a nine-membered ring (Fig. 4d); and (B) a structure with an ionized moiety (Fig. 4e). We are currently communicating with the developer of Balloon, and believe that some of the problems due to structure generation will be resolved in future releases. Expected Uses of KNApSAcK-3D One might ask why it is important to generate the 3D structures of natural compounds for which we already know the primary biological function in host organisms. However, many such natural compounds possess unexpected but useful and important biological functions in human beings. One of the most famous examples may be acetylsalicylic acid (aspirin) (Sneader 2000); originally extracted from willow bark and Spiraea species, it acts as a cyclooxygenase inhibitor and has numerous therapeutic applications ranging from analgesia to cancer prevention. Another intriguing example is paclitaxel (Taxol) (Peltier et al. 2006); originally extracted from the Pacific yew tree (Taxus brevifolia), this compound can be used to treat lung, ovarian breast, head and neck cancer, and is one of the most successful anticancer drugs in the pharmaceutical industry. Considering that there are no obvious advantages for the willow or yew trees to be able to cure human disease, these intriguing efficacies are most likely to be fortunate accidents. However, it has been shown that researchers have a better chance of finding an effective drug when searching among a set of natural products than just randomly browsing through the entirety of chemical space (Oprea and Gottfries 2001, Dobson 2004). The inherent properties of natural products or secondary metabolites apparently make them more likely to have biological activity in an unrelated organism (Butler 2004). Although such functions are more often found for newly isolated natural products, there are many examples of a new function or use being found for known compounds, such as in the case of aspirin. Therefore, we think that re-screening of secondary metabolites (e.g. for new target proteins) would be a good strategy for identifying new functions for known molecules. One of the challenges for SBDD, including docking studies, is that the most stable structure for a prepared compound can differ from that observed when the molecule is attached to a target protein. To deal with this issue, some docking software packages simultaneously perform both conformational ligand searches and docking searches (Mizutani et al. 2006), such that the ligand molecule can change shape during the docking search to fit the binding site on the target protein. Therefore, it is sufficient for our 3D compound database to provide a reasonably stable structure. In addition to application for rigorous structure-based drug design, the 3D structures of compounds can also be used for more qualitative analyses, such as 3D-QSAR (Kubinyi 1993). One advantage of finding effective drug candidates among secondary metabolites is that the material is likely to be abundant; once an interesting function has been suggested by virtual screening, materials for the necessary bioassays, optimizations and drug production may be easily obtained by cultivating the source organisms. Future Development and Concluding Remarks All available automated 3D structure generation software packages have their limitations, so it is important to maintain a manually curated database with high quality structures for reliable SBDD. This is primarily why we initiated the KNApSAcK-3D project. Although Balloon did an excellent job generating the 3D structures, the automated nature of the protocol means that a number of inappropriate structures may be included in the results (and therefore in the database). One common problem was the improper recognition of stereochemistry from the planar structure. Several traditional notations of stereochemistry (e.g. for saccharide molecules) are difficult for computer programs to recognize. We are in the process of manually assessing each structure in an effort to improve the quality of the data, and have already made significant improvements. Meanwhile, the corresponding authors look forward to receiving comments regarding the quality of the structures or possible improvements to the KNApSAcK-3D database. There are several physicochemical parameters or properties known to be associated with drug likelihood of the molecule, or pharmacophore (Mason 2001). By using 3D structures prepared in the KNApSAcK-3D database, many of these values can be estimated. We are currently preparing these molecular descriptors, including molecular volume and electrostatic properties such as dipole moment. for each molecule included in KNApSAcK-3D, so that the users can utilize these values to choose their target molecules. Data Availability The individual 3D structures can be downloaded from the compound page (Fig. 1c), and the 3D data are available from the corresponding authors upon request. Supplementary data Supplementary data are available at PCP online. Funding This work was supported by the National Bioscience Database Center in Japan; the Ministry of Education, Culture, Sports, Science and Technology of Japan [Grant-in-Aid for Scientific Research on Innovative Areas ‘Biosynthetic machinery: Deciphering and regulating the System for Creating Structural Diversity of Bioactive Metabolites (2007)’]. Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013. 7 K. Nakamura et al. References Afendi, F.M., Okada, T., Yamazaki, M., Hirai-Morita, A., Nakamura, Y., Nakamura, K. et al. (2012) KNApSAcK Family Databases: integrated metabolite–plant species databases for multifaceted plant research. Plant Cell Physiol. 53: e1(1–12). Allinger, N.L. (1977) Conformational analysis. 130. MM2. A hydrocarbon force field utilizing V1 and V2 torsional terms. J. Amer. Chem. Soc. 99: 8127–8134. Butler, M.S. (2004) The role of natural product chemistry in drug discovery. J. Nat. Prod. 67: 2141–2153. Dobson, C.M. (2004) Chemical space and biology. Nature 436: 824–828. Halgren, T.A. (1996) Merck Molecular Force Field. 1. Basis, form, scope, parameterization and performance of MMFF94. J. Comput. Chem. 17: 490–313. Herraez, A. (2006) Biomolecules in the computer. Jmol to the rescue. Biochem. Mol. Biol. Edu. 34: 255–261. Hirose, F., Inagaki, N., Hanada, A., Yamaguchi, S., Kamiya, Y., Miyano, A. et al. (2012) Cryptochrome and phytochrome cooperatively but independently reduce active gibberellin content in rice seedlings under light irradiation. Plant Cell Physiol. 53: 1570–1582. Jeknic, Z., Morre, J.T., Jeknic, S., Jevremovic, S., Subotic, A. and Chen, T.H.H. (2012) Cloning and functional characterization of a gene for capsanthin-capsorubin synthase from tiger lily (Lilium lancifolium thunb. ‘Splendens’). Plant Cell Physiol. 53: 1899–1912. KubinyiH. In 3D Qsar in Drug Design, Theory Methods and Applications Kluwer Academic Publishers, Dordrecht. Luo, L., Li, W., Miura, K., Ashikari, M. and Kyozuka, J. (2012) Control of tiller growth of rice by OsSPL14 and strigolactones, which work in two independent pathways. Plant Cell Physiol. 53: 1793–1801. Mason, J.S., Good, A.C. and Martin, E.J. 3-D Pharmacophores in drug discovery. Curr. Pharm. Des. 7: 567–597. Mewis, I., Schreiner, M., Nguyen, C.N., Krumbein, A., Ulrichs, C., Lohse, M. et al. (2012) UV-B irradiation changes specifically the secondary metabolite profile in broccoli sprouts: induced signaling overlaps with defense response to biotic stressors. Plant Cell Physiol. 53: 1546–1560. Miteva, M.A., Guyon, F. and Tuffery, P. (2010) Frog2: efficient 3D conformation ensemble generator for small compounds. Nucleic Acids Res. 38: W622–E627. Mizutani, M.Y., Takamatsu, Y., Ichinose, T., Nakamura, K. and Itai, A. (2006) Effective handling of induced-fit motion in flexible docking. Proteins 63: 878–891. 8 Oishi, T., Tanaka, K., Hashimoto, T., Shinbo, Y., Jumtee, K., Bamba, T. et al. (2009) An approach to peak detection in GC-MS chromatograms and application of KNApSAcK database in prediction of candidate metabolites. Plant Biotechnol. 26: 167–174. Okada, T., Afendi, F.M., Altaf-Ul-Amin, M., Takahashi, H., Nakamura, K. and Kanaya, S. (2010) Metabolomics of medicinal plants: the importance of multivariate analysis of analytical chemistry data. Curr. Comput. Aided Drug Des. 6: 179–196. Oprea, T.I. and Gottfries, J. (2001) Chemography: the art of navigating in chemical space. J. Comb. Chem. 3: 157–166. Peltier, S., Oger, J.-M., Lagarce, F., Couet, W. and Benoit, J.-P. (2006) Enhanced oral paclitaxel bioavailability after administration of paclitaxel-loaded lipid nanocapsules. Pharm. Res. 23: 1243–1250. Puranen, J.S., Vainio, M.J. and Johnson, M.S. (2010) Accurate conformation-dependent molecular electrostatic potentials for high-throughput in silico drug discovery. J. Comput. Chem. 31: 1722–1732. Sadowski, J. and Gasteiger, J. (1994) Comparison of automatic three-dimensional model builders using 639 X-ray structures. J. Chem. Inf. Comput. Sci. 34: 1000–1008. Sato, S., Tanabe, S., Hirakawa, H., Asamizu, E., Shirasawa, K., Isobe, S. et al. (2012) The tomato genome sequence provide insights into fleshy fruit evolution. Nature 485: 635–641. Seo, S., Gomi, K., Kaku, H., Abe, H., Seto, H., Nakatsu, S. et al. (2012) Identification of natural diterpenes that inhibit bacterial wilt disease in tobacco, tomato and arabidopsis. Plant Cell Physiol. 53: 1432–1444. Seto, Y., Kameoka, H., Yamaguchi, S. and Kyozuka, J. (2012) Recent advances in strigolactone research: chemical and biological aspects. Plant Cell Physiol. 53: 1843–1853. Sneader, W. (2000) The discovery of aspirin: a reappraisal. BMJ 321: 1591–1594. Takahashi, H., Hirai, A., Shojo, M., Matsuda, K., Parvin, A.K., Asahi, H. et al. (2011) Species–metabolite relation database KNApSAcK and its multifaceted retrieval system, KNApSAcK family. In Handbook of Systems Toxicology. Edited by Cascano, D.A. and Sahu, S.C. pp. 291–298. John Wiley & Sons Ltd, Chichester. Vainio, M.J. and Johnson, M.S. (2007) Generating conformer ensembles using a multiobjective genetic algorithm. J. Chem. Inf. Model. 47: 2462–2474. Yamada, M., Nakamura, K., Ichinose, T. and Itai, A. (2006) Starting point to molecular design: efficient automated 3D model builder Key3D. Chem. Pharm. Bull. 54: 1680–1685. Plant Cell Physiol. 54(2): e4(1–8) (2013) doi:10.1093/pcp/pcs186 ! The Author 2013.
© Copyright 2026 Paperzz