BIT16 Book of Abstracts 16-18 June 2016, Toruń, Poland PROGRAM COMMITEE: • Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland ) • Prof. Janusz Bujnicki (International Institute of Molecular and Cell Biology in Warsaw and Adam Mickiewicz University in Poznan, Poland ) • Prof. Jarek Meller (University of Cincinnati, USA) • Prof. Jerzy Tiuryn (University of Warsaw, Poland ) • Dr. hab. Witold Rudnicki (University of Warsaw, Poland ) LOCAL ORGANIZING COMMITTEE: • Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland ) • Dr. Aleksandra Gruca (Silesian University of Technology, Gliwice, Poland ) • Dr. Lukasz Peplowski (Nicolaus Copernicus University, Torun, Poland ) • Dr. Anna Gogolińska (Nicolaus Copernicus University, Torun, Poland ) • M. Eng. Rafal Jakubowski (Nicolaus Copernicus University, Torun, Poland ) • M. Eng. Jakub Rydzewski (Nicolaus Copernicus University, Torun, Poland ) Contents Lectures Eran Elhaik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . From DNA to Home Village in 3 seconds: using the Geographic Population Structure (GPS) to empower personalized medicine Rafal Ploski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Whole exome sequencing in medical genetics Andrzej Kloczkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integration of structural and dynamics data for proteomics Andrzej Koliński . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiscale modeling of large protein systems Pawel P. Labaj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . City microbiome as a missing element of exposome for Personalized Wellness and Medicine Lucjan Wyrwicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction of non-interacting protein pairs for ’omics’ studies in translational medicine Malgorzata Kotulska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On healthy and unhealthy contacts. Bioinformatics point of view Alexander Wlodawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural studies of medically-interesting protease inhibitors and lectins Michal Laźniewski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of molecular docking for predicting protein substrate specificity Mai Suan Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A novel method for navigating optimal path for ligand escape from binding site and application to drug design Sebastian Kmiecik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peptide drug design using CABS-dock web server for protein-peptide docking Karina Kubiak-Ossowska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Point-of-Care (PoC) testing kits. Focus on antibodies Dimitar I. Vassilev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Models for error and variants detection in de novo sequenced samples and communities Katarzyna Werheim-Tysarowska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bioinformatics in genetics of rare disorders from molecular diagnostician point of view Jacek Leluk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparative studies on proteins. The effect of the wrong assumptions on the result accuracy and reliability Jacek Śmietański . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Machine Learning Trends in RNA Bioinformatics Maciej Sykulski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Mixture Gaussian Bayesian Graphical Model in application to DNA microarray segmentation robust to spatial noise Lukasz Peplowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of the Molecular Dynamics Simulations in Health Related Issues Jarek Meller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LINCS Molecular Signatures as a Resource for Personalized Precision Medicine Posters M. Antczak • 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of new functionalities of RNAComposer in order to improve prediction accuracy P. Boguslawska • 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling inhibitory mechanisms in biological systems using Petri nets M. Burdukiewicz • 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AmyloGram: n-gram analysis and prediction of amyloids K. Chmielewska • 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selected aspects of the participation of tobacco smoke in the development of atherosclerosis using Petri nets M. P. Ciemny • 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CABS-dock web server for flexible docking of peptides to proteins P. Daniluk • 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing maximal cliques on GPU – application to structural alignments I. Deb • 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Epigenetic modifications in RNA: Molecular dynamics studies R. Filip • 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational study on hemagglutinin antigenic sites of influenza virus type A W. Frohmberg • 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . . . . . . . . . . . . . 21 22 . . . . . . . . . . . . . 23 . . . . . . . . . . . . . 24 . . . . . . . . . . . . . modeled and analyzed 25 . . . . . . . . . . . . . 26 . . . . . . . . . . . . . 27 . . . . . . . . . . . . . 28 . . . . . . . . . . . . . 29 . . . . . . . . . . . . . 30 New approach to de-novo genome assembly using string graph fork detection technique and longest contig path scaffolding method M. Garbulowski • 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predicting the age status of somatic mutations by Gaussian mixture decomposition A. Gogolińska • 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complete petri net study of medically relevant interactions in the immune system model S. Goldowska • 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modification of the accessibility of the human soluble epoxide hydrolase active site, in silico study J. Jablońska • 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New method of Sholl analysis R. Jakubowski • 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overminimization in molecular dynamics simulations P. Kosiorek • 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Influence of the artificial protein nanotube on a cell membrane J. A. Kowalska • 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching for structural patterns in the vicinity of microRNA in plants M. Kurczyńska • 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PyRosetta energy terms as indicators for protein mirror models M. Laźniewski • 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of molecular docking for predicting protein substrate specificity T. Magdziarz • 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PANTA RHEI: analysis of solvent flow in MD simulations M. Mielczarek • 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NGS-based analysis of copy number variations in various cattle breeds P. Miszta • 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theoretical study of influence of mutations on superoxide dismutase SOD1 dimer by molecular dynamics C. Pareek • 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transcriptomic analysis of gene expression data from Bos taurus liver using RNA-Seq A. Pluciennik • 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The evolutionary glance at tunnels in proteins L. P. Pryszcz • 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Redundans: an assembly pipeline for highly heterozygous genomes T. Ratajczak • 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ranking of RNA models Using Sphere Consensus A. Rybarczyk • 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tabu Search Algorithm for RNA Partial Degradation Problem J. Rydzewski • 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conformational sampling of a biomolecular rugged energy landscape J. Rydzewski • 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ligand diffusion pathways in cytochrome P450cam K. Smolińska • 29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling of transcription factor binding sites–a machine learning approach B. Sokolowska • 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pattern recognition approach to rheumatic diseases study J. Sota • 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massively parallel sequencing in diagnostics of genodermatoses J. Sota • 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CNVs detection algorithm as a useful diagnostic tool in targeted NGS analysis M. Stolarczyk • 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disitribution analysis of L - SAARs in signal peptides across multiple eukaryotes M. Stolarczyk • 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conservation metrics overview - pros and cons A. Szabelska • 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Influence of the primary analysis on discovering differentially expressed genes based on RNA-Seq data P. Weber • 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Towards a simple mathematical model of hampered diffusion in biological setting J. Wiedermann • 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . StructAnalyzer - a tool for sequence vs. structure similarity analysis M. Wnetrzak • 38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The impact of the crossover operator on the results of evolutionary-based algorithms in the problem of the genetic code optimization P. Woźniak • 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlated mutations select misfolded from properly folded proteins 4 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 J. Ziobro • 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical modelling of immune response T. Zok • 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A new method to evaluate quality of RNA 3D models A. Żyźniewska • 42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Study on correlated mutations in plant proteinase inhibitors - Bowman-Birk . . . . . . . . . . . . . . . . . . . . . . 61 . . . . . . . . . . . . . . . . . . . . . . 62 . . . . . . . . . . . . . . . . . . . . . . and Kunitz family 63 Lectures From DNA to Home Village in 3 seconds: using the Geographic Population Structure (GPS) to empower personalized medicine Eran Elhaik1 1 University of Sheffield, Department of Animal and Plant Sciences, Sheffield, UK Humans’ place of origin is known to be valuable for studying history, anthropology, genetics, epidemiology, and has critical importance in the field of pharmacokinetics where a growing number of treatments differ in their efficacy when applied to different populations. It is thereby not surprising that the search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data to address this question with limited success. Biogeographical algorithms using next-generation sequencing data achieved an accuracy of 700 km in Europe but were inaccurate elsewhere. The Geographic Population Structure (GPS) [1] utilizes a meager number of SNPs and can place 83% of worldwide-individuals in their country of origin. Applied to over 200 Sardinian villagers, GPS placed a quarter of them in their villages and most of the remaining within 50km of their villages Recently, GPS localized Ashkenazic Jews to 1500 years old villages in northeastern Turkey whose names likely derived from the word ”Ashkenaz” [2]. The accuracy and power of GPS to infer the geographical origin of worldwide-individuals down to their country or, in some cases, village of origin, underscore the promise of admixture-based methods for biogeography and has broad ramifications for genetic ancestry testing, disease studies, and personalized medicine. [1] Elhaik, E. et al. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nat. Commun. 5, doi:10.1038/ncomms4513 (2014). [2] Das, R., Wexler, P., Pirooznia, M. & Elhaik, E. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biol. Evol., doi:10.1093/gbe/evw046 (2016). 2 Whole exome sequencing in medical genetics Rafal Ploski1, ∗ 1 Department of Medical Genetics, Warsaw Medical University In 2012 Department of Medical Genetics (Warsaw Medical University) has acquired Illumina HiSeq 1500 which allowed to establish whole exome sequencing (WES) as method for both research and diagnostic purposes. Since then we have performed > 1000 WES analyses, most of which aimed at finding diagnosis in patients suspected to suffer from rare neurological disorders with a genetic basis. We also established bioinformatics infrastructure and a pipeline which allows efficient analysis of the WES data. During the lecture selected findings will be presented illustrating how WES enables discovery of new mutations in known disease associated genes (including mutations associated with novel phenotypes) as well as discovery of novel diseases (i.e. those caused by mutations in genes not yet associated with known human diseases). ∗ To whom the correspondence should be addressed: [email protected] 3 Integration of structural and dynamics data for proteomics Andrzej Kloczkowski1 1 The Ohio State University College of Medicine, Columbus, USA 4 Multiscale modeling of large protein systems Andrzej Koliński1 1 Faculty of Chemistry, University of Warsaw, Poland The traditional computational modeling of protein structure, dynamics and interactions remains difficult for many protein systems. It is mostly due to the size of protein conformational spaces and required simulation timescales that are still too large to be studied in atomistic detail. Lowering the level of protein representation from all-atom to coarse-grained opens up new possibilities for studying protein systems [1]. Possible multiscale strategies for efficient coarse-grained modeling and recent applications of CABS modeling tools are briefly discuses. CABS (C-Alpha, Beta and Side-chain) is a medium resolution model. In comparison with other realistic coarse-grained models, CABS provides similar resolution but it is based on qualitatively different interaction and sampling concepts. The choice of united atoms for modeling main chains assumes two pseudoatoms per residue. Side chain are represented by two spherical pseudo-atoms, one centered on Cβ and the other placed in the center of mass of the remaining portion of the side chain, where applicable. The main chain Cα positions are restricted to knots of a cubic lattice of small spacing, equal to 0.61 Å. This lattice Cα trace is used as the only independent variable that defines positions of other united atoms. Recently, we provided several easy to use web servers based on the CABS based modeling techniques (available at: http://biocomp.chem.uw.edu.pl/tools). The servers are dedicated to de novo and comparative modeling of structure prediction [2], studies of protein dynamics [3], and unrestrained, fully flexible, docking of peptides to protein receptors [4, 5]. Multiscale modeling strategies for studies of large protein complexes, combining CABS simulations with all-atom Molecular Dynamics, are briefly discussed. Support from the National Science Center (Poland) grant MAESTRO 2014/14/A/ST6/00088 is kindly acknowledged. [1] Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid , Kolinski A (2016) Coarse-grained protein models and their applications. Chemical Reviews, in press [2] Blaszczyk M, Jamroz M, Kmiecik S, Kolinski A (2013) CABS-fold: server for the novo and consensusbased prediction of protein structure. NAR 41(W1):W406-W411 [3] Jamroz M, Kolinski A, Kmiecik S (2013) CABS-flex: server for fast simulation of protein structure fluctuations. NAR 41(W1):W427-W431 [4] Kurcinski M, Jamroz M, Blaszczyk M, Kolinski A, Kmiecik S(2015) CABS-dock: web server for flexible docking of peptides to proteins without prior knowledge of the binding site. NAR 43(W1):W419-W424 [5] Blaszczyk M, Kurcinski M, Kouza M, Wieteska L, Debinski A, Kolinski A, Kmiecik S (2016) Modeling of protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking. Methods 93:72-83 5 City microbiome as a missing element of exposome for Personalized Wellness and Medicine Pawel P. Labaj1, ∗ 1 Chair of Bioinformatics RG, Boku University Vienna, Austria APART Fellow of Austrian Academy of Science, MetaSUB International Consortium In the era of fast-paced development of technology and services, there are limitless opportunities for customization to meet specific user needs. This is understandable since non-specific interventions for non-targeted populations often fall short of desired performance expectations in health outcomes. Over the next decade, as much as half of the proportion of health care will shift from the hospital and clinic to the home and community [1]. With Personalized Medicine understood as prevention and treatment strategies that take individual variability into account we need to identify this individual variability via characterizing each person’s individual baseline health state instead of resorting to population-based variable distributions. This health state baseline cannot be, however, determined with use of just the classical medical records. Recent technological advances have created opportunities to harness additional sources of biomedical data on a real time basis, for instance through the use of (1) mobile medical devices for monitoring dedicated health parameters (insulin, heart rate, etc), and (2) wearables [2, 3]. Initially starting out as simple devices to monitor basic wellness parameters, these devices have in recent years attracted a lot of interest and efforts from companies (e.g. Apple and Google) who are keen on developing innovations that border on wellness and healthcare. The synergy of these two streams should provide a good estimate of the health state baseline. In order to model estimated data of health state baseline and future scenarios, it is imperative to include an important, yet largely missing third component - the exposome. This term cover all the exposures of an individual in a lifetime. So far it was mostly connected with air quality, light, climatic variations, ozone and volatile organic compounds. But we cannot forget about the ’living’ component of exposome. As dense human environments such as cities account for over a half of the world population [4] (in EU 80%) there is a need to build a molecular portrait of cities in order to study what lives around us and how it affects our health and wellbeing [5]. ∗ To whom the correspondence should be addressed: [email protected] [1] Dishman, E. 2012 [www.ey.com/GL/en/Industries/Life-Sciences/The-personal-health-technologyrevolution] [2] Milenković, A., Otto, C., & Jovanov, E. Computer communications 2012. 29, 2521–2533. [3] Bonaccorsi, M., Fiorini, L., Cavallo, F., Esposito, R., & Dario, P. Ambient Assisted Living 2015 465–475. [4] Afshinnekoo, E., et al. Cell Systems 2015 1 72–87. [5] The MetaSUB International Consortium Microbiome 2016 4 24 6 Prediction of non-interacting protein pairs for ’omics’ studies in translational medicine Lucjan Wyrwicz1 1 Center of Oncology, Warsaw, Poland Protein–protein interactions (PPIs) play a vital role in most biological processes. Hence their comprehension can promote a better understanding of the mechanisms underlying living systems. However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several in silico PPI prediction methods using both structural and genomic information were developed for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances can be considerably improved by focusing on data preparation. 7 Healthy and unhealthy contact sites Malgorzata Kotulska,1, ∗ Witold Dyrka,1 Bogumil Konopka,1 Monika Kurczyńska,1 and Pawel Woźniak1 1 Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology Contact maps of proteins specify these amino acids which are located in 3-dimensional structures of proteins within a specified distance limit. The standard thresholds are 8 Å for calculating distances between C or Cβ backbone atoms or 6 Å if whichever atoms are considered. Contact sites may provide information of intramolecular distances, useful in protein structure reconstruction or evaluation of its correctness – obtained either from experimental or modeling studies. Predicting a contact map can also establish the first stage in modeling unknown protein structure. Moreover, information of intermolecular contact sites is essential in molecular docking used for predicting and engineering receptor binding, applied in a drug design. Research on contact sites encounters a few issues which will be discussed in this talk. Firstly, bioinformatical prediction of a protein contact sites from its amino acid sequence can be an essential step in modeling protein structures. The best tools however still do not provide sufficient accuracy. We will discuss different approaches and the problems that appear [1, 2]. Secondly, molecules of a natural and mirror orientations share the same contact maps, which poses a problem with their use. Proteins are 3-dimensional objects which can assume different orientations – for each protein several mirror structures could be constructed. The looking-glass orientation may occur at the superficial level of a molecule general symmetry or its secondary structure orientations. Deeper levels of inversions may concern different chirality of amino acids, which may accompany mirror orientations at the higher levels. All these structures can also assume similarly stable energy levels, hence all can exist in the real world. However, nature is not symmetrical and prefers only one type of an orientation: natural aminoacids are left handed, secondary structures have a specific preference, e.g. helices are typically right-handed, and at the level of general symmetry only one structure has evolved. Mirror molecules are of great interest to researchers pursuing a project of ”mirror life”, or in a quest for non-degradable aptamers. But in bioinformatics mirror structures could be troublesome. In the process of protein structure reconstruction, based on a contact map, two sets of models are generated, which may be energetically equivalent. Selecting the same structure as the nature could have chosen is not always straightforward [3]. Finally, an uncontrolled change in the pattern of a peptide contact sites is an issue appearing in amyloid proteins. An irregular physiological pattern of contact sites may switch into very dense contact sites which form a zipper-like beta structures. There are a few examples when it leads to a desirable structure – very durable in terms of its mechanical properties and not susceptible to proteolytic enzymes. However, in most cases of a biological tissue these dense aggregates trigger a chain of events leading to the cell death, e.g. in Alzheimer’s and other neurodegenerative diseases. The question we address is whether the onset of such unhealthy contacts is predictable and how we can control the process [4–6]. ∗ [1] [2] [3] [4] [5] To whom the correspondence should be addressed: [email protected] Konopka BM, Ciombor M, Kurczynska M, Kotulska M. J Membr Biol. 2014;247(5):409-20. Wozniak PP, Kotulska M. J Mol Model. 2014;20(11):2497 Kurczynska M, Kania E, Konopka BM, Kotulska M. J Mol Model. 2016;22(5):111. Gasior P, Kotulska M. BMC Bioinformatics. 2014;15:54. Wozniak PP, Kotulska M. Bioinformatics. 2015;31(20):3395-7. 8 Structural studies of medically-interesting protease inhibitors and lectins Alexander Wlodawer,1 Jacek Lubkowski,1 Alla Gustchina,1 Dongwen Zhou,1 Michal Jakob,1, 2 Barry R. O’Keefe,2 Rodrigo da Silva Ferreira,3 Yara A. Lobo,3 Daiane Hansen,3 and Maria L. V. Oliva3 1 Macromolecular Crystallography Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA 2 Molecular Targets Laboratory, Center for Cancer Research, National Cancer Institute, Frederick, MD 21702, USA 3 Departamento de Bioquı́mica, Universidade Federal de São Paulo, 04044-020 São Paulo, SP, Brazil; e-mail: [email protected] Several protease inhibitors and lectins with anti-cancer properties have been investigated by X-ray crystallography as well as by biochemical and biophysical techniques. Two of them are potent inhibitors of trypsin-related enzymes. EcTI, isolated from the seeds of Enterolobium contortisiliquum, inhibits the invasion of gastric cancer cells through alterations in integrin-dependent cell-signaling pathway. BbKI, found in Bauhinia bauhinioides seeds, is a kallikrein inhibitor with a reactive site sequence similar to that of kinins, the vasoactive peptides inserted in kininogen moieties. A much weaker protease inhibitor isolated from the bark of Crataeva tapia tree (CrataBL) also functions as a lectin. BfL, a GalNAc-specific lectin from Brazilian orchid tree Bauhinia forficata was shown to inhibit growth of several cancer lines. CGL, a lectin isolated from the sea mussel Crenomytilus grayanus, was investigated based mainly on the similarity of its sequence to another lectin, MytiLec, which was resistant to crystallization. We determined high-resolution crystal structures of free EcTI and in complex with bovine trypsin, in the process re-determining the amino acid sequence. Modeling of the putative complexes of EcTI with several serine proteases and a comparison with equivalent models for other Kunitz inhibitors elucidated the structural basis for the fine differences in their specificity. The structure of free BbKI indicated that the presence of disulfide bonds is not necessary for stabilization of the fold of the members of this family. A model of a complex of BbKI with plasma kallikrein indicates the need for mutual rearrangement of the interacting molecules. We have also determined the high-resolution crystal structure of glycosylated CrataBL. We have shown that, as a lectin, CrataBL binds only sulfated oligosaccharides, most likely heparin and its derivatives. CGL displays antibacterial, antifungal, and antiviral activities, and displays high affinity for mucin-type receptors, abundant on some cancer cells. We determined its crystal structure and modeled the glycan-binding pockets, based on the location of the glycerol molecules bound in the three sites exhibiting quasi-threefold symmetry. A number of structures of BfL elucidated the mode of binding of its primary ligand GalNAc, as well of a number of cancer-related Tn-antigens and blood group antigens, explaining the basis of its very strict specificity, similar to the specificity of CGL despite a completely different threedimensional structure. 9 Application of molecular docking for predicting protein substrate specificity Michal Laźniewski,1, 2, ∗ Krzysztof Kuchta,1 Dariusz Plewczyński,3 and Krzysztof Ginalski1 1 Laboratory of Bioinformatics and Systems Biology, Centre of New Technologies, University of Warsaw, Poland 2 Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, Poland 3 Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Poland In recent years, the application of next generation sequencing brought a substantial increase in the number of known protein sequences. The Refseq database alone contains now more than 60 million unique sequences, 50 times more than only ten years ago. For such an overwhelming amount of data, functional assignment using only experimental techniques is a challenging task. With structural genomics providing thousands of new protein structures [1], the additional information might be utilized by structure-based methods to improve the quality of bioinformatics predictions. Thus molecular docking might prove invaluable where other theoretical techniques fail to predict the function of an analyzed protein. It has been already shown that molecular docking can frequently predict the correct conformation of a small compound in a protein-ligand complex; however, calculating the binding energy remains a challenge [2]. On the other hand, with most of the efforts focused on studying the interactions between proteins and drug-like inhibitors, there only has been rather limited emphasis put on analyzing the proteins and their in vivo small molecule partners [3]. The goal of our work was a comprehensive analysis of the performance of various molecular docking algorithms in wide-scale predictions of substrate specificity of proteins from a single organism – Escherichia coli. We analyzed all E. coli enzymes for which the crystal structure was solved together with their cognate partner (the substrate or product of the reaction). Specifically, two sets of molecules where docked to each enzymes: (a) compounds present in the active sites of all selected proteins and (b) the entire metabolome of E. coli. The performance of four different programs (GOLD, eHiTs, Surflex and Glide) were tested, including both the ability to identify the enzyme’s cognate ligand and to correctly predict the ligand conformation. Moreover, we discuss if applying machine learning methods could further increase the quality of such structure-based functional predictions. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Weigelt, Exp cell res, 2010, 316(8), 1332-1338. [2] D. Plewczynski, M. Lazniewski, et al., J Comput Chem, 2011, 32(4), 742-755. [3] A. Macchiarulo, I. Nobeli, et al., Nat Biotechnol, 2005, 22(8), 1039-1045. 10 A novel method for navigating optimal path for ligand escape from binding site and application to drug design Mai Suan Li,1, ∗ Quan Van Vuong,2 and Tin Trung Nguyen2 1 2 Polish Academy of Sciences, Warsaw, Poland Institute for Computational Science and Technology, Ho Chi Minh city, Vietnam In the first part of this talk I shall present our new method for finding the optimal path to pull a ligand from the binding pocket by the steered molecular dynamics (SMD). The optimal path corresponds to the minimal hindrance, introduced as a scoring function, to ligand displacement. Contrary to the existing caver method, our approach takes into account the geometry of ligand leading to better correlation between experimental inhibition constants and mechanical works estimated by SMD. In the second part the virtual screening, improved SMD and MM-PBSA methods are applied to search for potential drugs for the Alzheimer’s disease (AD) from large data bases of natural and synthesized compounds. The design strategy is based on the amyloid cascade hypothesis which posits that AD is caused by aggregation of amyloid beta (Aβ) peptides. Oligomers and protofibrils were taken as drug targets as they seem to be more cytotoxic than mature fibrils. Some of tophit compounds, predicted by our in silico study, have already passed in vitro test for inhibition activity, blood-brain barrier crossing and non-toxicity to cells. Concerning peptide-based inhibitors, we have shown that presence of tryptophan and proline residues in tripeptides is crucial for their tight binding to Aβ fibrils as well as for extensive fibril depolymerization. Fullerenes and their derivatives were found to have high binding affinity to Aβ and ability to block Aβ aggregation. The binding free energy linearly scales with the size of fullerenes. Finally, the application of our method to virtual screening of potential drugs for breast cancer will be discussed. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Nasica-Labouze, Mai Suan Li, et al., Chemical Reviews 115, 3518-3563 (2015) [2] M.H. Viet, K. Siposova, Z. Bednarikova, A. Antosova, T.T. Nguyen, Z. Gazova, and Mai Suan Li, J. Phys. Chem. B 119, 5145 (2015). [3] P.D.Q. Huy, and Mai Suan Li, Phys. Chem. Chem. Phys 16, 20030-20040 (2014) [4] M. H. Viet, C-Y. Chen, C-K. Hu, Y-R. Chen, and Mai Suan Li, Plos One 8(11), e79151 (2013). [5] P.D.Q. Huy, Y-C. Yu, S.T. Ngo, T.V. Thao, C-P Chen, Mai Suan Li, and Y-C Chen, BBA-General Subjects 1380, 2960 (2013). [6] Quan Van Vuong, Tin Trung Nguyen, and Mai Suan Li, J. Chem. Inf. Model. 55, 2731 (2015) [7] Tin Trung Nguyen et al, Journal of Molecular Modeling (in press). 11 Peptide drug design using CABS-dock web server for protein-peptide docking Mateusz Kurcinski,1 Maciej Blaszczyk,1 Maciej Pawel Ciemny,1, 2 Andrzej Koliński,1 and Sebastian Kmiecik1, ∗ 1 Faculty of Chemistry, University of Warsaw, ul. Pasteura 1, 02-093 Warszawa, Poland 2 Faculty of Physics, University of Warsaw, ul. Pasteura 5, 02-093 Warszawa, Poland Peptides play essential functional roles in living organisms and have recently attracted much attention for their potential therapeutic use. Therefore, structural characterization of proteinpeptide interactions is a hot subject of current pharmaceutical research. Computational modeling of the structure of protein–peptide interactions is usually divided into two stages: (1) prediction of the binding site at a protein receptor surface, and then (2) docking (and modeling) the peptide structure into the known binding site. We present a comprehensive CABS-dock method for the simultaneous search of binding sites and flexible protein–peptide docking [1, 2]. The CABS-dock is freely available as a user’s friendly web server at http://biocomp.chem.uw.edu.pl/CABSdock/. An important feature that distinguishes the CABS-dock from other state-of-the-art docking tools is the ability to account for large-scale rearrangements of selected receptor fragments, and simultaneously for full peptide flexibility, during explicit docking simulations [1–4]. This makes the CABS-dock a unique tool for modeling protein-peptide interactions associated with large-scale conformational changes of both the peptide and protein receptor structures. The talk will outline the CABS-dock methodology, its unique features and modeling opportunities. ∗ To whom the correspondence should be addressed: [email protected] [1] M. Kurcinski, M., M. Jamroz, M. Blaszczyk, A. Kolinski, and S. Kmiecik. Nucleic Acids Res., 2015, 43(W1): p. W419-24 [2] M. Blaszczyk, M., M. Kurcinski, M. Kouza, L. Wieteska, A. Debinski, A. Kolinski, and S. Kmiecik. Methods, 2016, 93: p. 72-83 [3] M. Kurcinski, M., A. Kolinski, and S. Kmiecik. J. Chem. Theory Comput., 2014, 10(6): p. 2224-2231 [4] M. P. Ciemny, M. Kurcinski, K. Kozak, A. Kolinski, S. Kmiecik. Methods Mol. Biol. (in press), 2016, [arXiv:1605.09303] 12 Point-of-Care (PoC) testing kits. Focus on antibodies Karina Kubiak-Ossowska1 1 ARCHIE-WeSt, University of Strathclyde, Glasgow, UK 13 Models for error and variants detection in de novo sequenced samples and communities Dimitar I. Vassilev1 1 Bioinformatics Group, AgroBio Institute and Joint Genomic Centre, Sofia, Bulgaria Faculty of Mathematics and Informatics, Sofia University, Bulgaria Fitting suitable models for discovery of errors and variants in de novo next generation metagenomics sequencing is a difficult task. Even after various tough preprocessing, analyses, chekings the datasets retain sequences from multiple microbial species which contribute a considerable amount of variation that conceals the errors. The application of standard denoising algorithms available for genomics is no longer possible because of the high rate of false positives in regions with natural variation in the data; at the same time rare natural variants are a subject of study where they need to be distinguished from the errors. This work uses both analytical nad machine learning to filter some of the false positives of other error and variant discovery algorithms applied both in metagenomics and polyploid NGS studies. A neural network and a random forest have been trained to identify the errors in the datasets with an accuracy of over 99%. While still insufficient for direct discovery of rare errors, it is demonstrated that the trained models provide a good filter to reduce the amount of incorrectly identified errors and decreasing the error/variant ratio without an increase in the false negatives. The opportunities for implementation of such models and accelerating their accuracy and running time in current development of medicine provides vast beackground for improvement of quality of diagnosis, therapies, practices, population studies, preventive actions, insurance. 14 Bioinformatics in genetics of rare disorders from molecular diagnostician point of view Katarzyna Wertheim-Tysarowska1, ∗ 1 Department of Medical Genetics, Institute of Mother and Child Rare diseases are defined as disorders affecting lower than 1/2000 live births. Majority of them is genetically determined, which results from DNA mutations. More than 6000 inherited rare disorders have been described so far, but this number tend to grow. Human DNA - “molecule of heredity”, contain about 3 billion base pairs which encode around 20 000-30 000 genes (total number of genes has not been established yet). So far over 10 million point variations (one or few nucleotides changes in DNA sequence) has been identified in human genome. According to the Human Gene Mutation Database (HGMD), almost 150 000 of them in about 7000 genes have been proved to be disease-causing. Changes in DNA sequence can occur during cell divisions and, if they arise during gametes formation, they are passed down to an offspring. It is estimated that each individual carries around 60 de novo point variations. Such mutations are further passed down to next generations, which is one of evolutionary mechanisms responsible for population diversity. Nevertheless, the functional effect of DNA variants can vary considerably. Majority of them does not influence cellular processes. However, mutations can also be pathogenic and affect gene expression or protein synthesis and activity. The final consequence of nucleotide sequence change depends on several factors i.e. its location, size and type. The goal of the molecular diagnostics process is to find disease-causing mutations in an affected person and interpret their clinical significance. Development of bioinformatics over the past few years has significantly influenced molecular diagnostic procedures. Not only access to information resources has been provided and simplified, but also dedicated tools facilitating data analysis and interpretation are constantly being improved. I will present state of art in the field of molecular diagnostics of rare disorders and the most important bioinformatics solutions that are in routine usage in this process. ∗ To whom the correspondence should be addressed: [email protected] 15 Comparative studies on proteins. The effect of the wrong assumptions on the result accuracy and reliability Jacek Leluk1, ∗ 1 Department of Molecular Biology, Faculty of Biological Sciences University of Zielona Góra, Poland The multiple sequence alignment is the fundamental step of comparative protein and genomic studies. It is an important intermediate data source leading to obtain consensus sequence defining the whole protein family, explaining the variability pathways, locating the structurally and functionally significant regions, and many other results, not only limited to the primary structural level. It is obvious that value and reliability of these results strongly depend on correct adjustment of the aligned sequences. The related problem is an accurate location of gaps. Otherwise all subsequent results may be doubtful. There is a number of algorithms for accomplishing the alignment procedure, which are implemented in many programs. Most of them refer to stochastic matrices of the observed nucleotide/amino acid replacement frequency. A tremendous number of applications comply the Markovian model of mutational amino acid replacement. This work demonstrates, that the approaches based on Markovian model and applying stochastic matrices such as PAM and BLOSUM contain some inconsistencies concerning interpreting the protein variability occurring in nature [1]. They do not reflect the natural mechanism of molecular evolution. The methods of gap location and continuity establishment are also not always justifiable for the homologous proteins. The proposed solution, based on genetic semihomology approach, takes into account both levels (nucleotide and amino acid) simultaneously, to reflect the natural evolutionary process (consisting of two components: mutational variability and natural selection) [2–4]. It applies the three-dimensional diagram of genetic relationships between amino acids instead of stochastic matrices of the replacement frequency. In this work there are also discussed the parameters that must be taken into account to evaluate whether the observed sequence identity/similarity is significant or casual. The problem of the correlated mutations identification and location in homologous proteins is presented as well [5–7]. ∗ [1] [2] [3] [4] [5] [6] [7] To whom the correspondence should be addressed: [email protected] J. Leluk, Computers & Chemistry, 2000, 24, 659-672. J. Leluk, Computers & Chemistry, 1998, 22, 123-131. J. Leluk, BioSystems, 2000, 56, 83-93. J. Leluk, B. Hanus-Lorenz, A.F. Sikorski, Acta Biochim. Polon., 2001, 48, 21-33. J. Leluk, Cell. Mol. Biol. Lett., 2000, 5, 91-106. Ly Le, J. Leluk, PLoS ONE, http://dx.plos.org/10.1371/journal.pone.0022970, 6(8) e22970, 2011. R. Filip, J. Leluk, G. Żaroffe, Adv. Biores., 2015, 6, 89-94 16 Machine Learning Trends in RNA Bioinformatics Jacek Śmietański1, ∗ 1 Faculty of Mathematics and Computer Science, Jagiellonian University Many bioinformatics problems cannot be effectively solved using classical algorithms, therefore more advanced techniques, including artificial intelligence and machine learning (ML) were introduced. In short: we classify the algorithm into “machine learning” category when it gives computers the ability to learn without being explicitly programmed. The key property of such algorithm is the power to learn from and make predictions on quite small sample data set. Its key feature (as opposed to classical algorithms) is the generalization ability. The popular bioinformatics tasks strongly connected to ML techniques are: “classification” and “prediction”. There are for example: sequences, structures and interactions classification as well as secondary and tertiary RNA structure prediction, inter- and intramolecular interactions prediction, structure function predictions, and molecular networks modeling. For each aforementioned challenge a number of dedicated algorithms has been developed and implemented. Among a variety of ML algorithms, SVM (Support Vector Machine) and ANN (Artificial Neural Networks) are most commonly used in bioinformatics tasks. The other like HMM (Hidden Markov Models), DT (Decision Trees) and RF (Random Forests) are also popular [1]. The Pubmed [2] search reveals over 500 articles connected witch ML applied to RNA problems, that were published to date. Over half of them were written within last 3 years (figure 1). That fact confirm the importance of ML methods in current bioinformatics. This study presents in details the current needs and trends in ML algorithms applied to RNA challenges. FIG. 1. ∗ To whom the correspondence should be addressed: [email protected] [1] Jensen LJ, Bateman A. The rise and fall of supervised machine learning techniques. Bioinformatics. 2011;27(24):3331-3332. doi:10.1093/bioinformatics/btr585 [2] Roberts RJ (2001). ”PubMed Central: The GenBank of the published literature”. Proceedings of the National Academy of Sciences, 2001 (98:2): 381–382 17 A Mixture Gaussian Bayesian Graphical Model in application to DNA microarray segmentation robust to spatial noise Maciej Sykulski,1, ∗ Boguslaw Kluge,1 and Anna Gambin1 1 Medical University of Warsaw, University of Warsaw Bayesian graphical models and Gaussian mixtures are useful tools in image analysis, and data segmentation [1]. A Bayesian graphical model is learned from data using maximum a posteriori probability (MAP) estimation. We propose a general formulation of a Mixture Gaussian Bayesian Graphical model, which can be defined on a large graph, for which the Expectation Maximization algorithm is tractable by solving a corresponding Quadratic Programming problem, and further optimization using alternating directions. We implement a package in R allowing for declaration of such models, and their EM optimization benefiting from sparsity of a formulation. We apply this approach to formulate the Background and Segments Markov random Fields model (BSMF) defined on two connected graphs: the spatial microarray grid graph, and the genomic linear graph connecting log2ratio data from a DNA microarray. Estimation of BSMF model results with spatial denoising of log2ratio data, and with segmentation recovering segments of different Copy Number. We present results of our approach on real data from aCGH microarrays, compare performance with Circural Binary Segmentation algorithm [2], analyze sensitivity to setting of prior parameters. FIG. 1. Factor graph for BSMF(Θ, Z; x, Ω) model. x is log2ratio data. a, b are segmentation, and spatial Markov random fields respectively. z, y are indicators of breaks in the a, b fields. ∗ To whom the correspondence should be addressed: [email protected] [1] Zhang, Y. and Brady, M. and Smith, S., Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm, Medical Imaging, 2001, 20, 45–57. [2] Olshen, A. and Venkatraman, ES and Lucito, R. and Wigler, M., Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 2004, 5, 557–572. 18 Application of the Molecular Dynamics Simulations in Health Related Issues Lukasz Peplowski,1, ∗ Rafal Jakubowski,1 and Wieslaw Nowak1 1 Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University in Toruń Development of new, innovative and effective drugs are nowadays most desirable research targets of all pharmaceutical companies. To reach this goal molecular basis of a disease should be found. Most experiments are very expensive, thus theoretical approaches come into the play. One of the best methods to acquire dynamical pictures of structural changes of drug targets is molecular dynamics (MD). In this we show results of MD simulations focused on (1) drug delivery problem and two healthrelated protein systems: (2) exchange protein directly activated by cAMP 2 (Epac2) [1, 2] and (3) transthyretin [3] – thyroid hormone transporter. FIG. 1. Model of cell membrane piercing at t = 0. A SW zigzag carbon nanotube with cisplatin is on the right side of figure. (1) The first topic is related to bionanotechnological support in a drug delivery problem. Recognition of a cancer cell and subsequent injection a medicine through the cell membrane can increase effectiveness of therapy and decrease side effects. Here we present results of MD modeling of a popular anti-cancer drug cisplatin injection through the membrane (Fig. 1). As a nano-needle a carbon nanotube was used Simulations show that it is possible to transport the drug in a both sided opened carbon nanotube through a model cell membrane. (2) Second topic is hot – optogenetics. We show how, at molecular level, recently developed photoswitch – JB253 affects Epac2, a protein involved in insulin release from pancreatic cells. It is possible to control this process through light, using azobenzenes as photoactive molecules. Here we show structural impact of such light induced switching on Epac2 protein. (3) The last issue is related with genetics and effects of L55P and V30M mutations on transthyretin protein—thyroid hormone transporter. Numerous mutations in this protein cause formation of amyloid fibrils and in effect lead to diseases like Familial Amyloidgenic Neuropathy or schizophrenia. Our MD molecular calculations and careful analysis show almost unnoticeable, but in our opinion critical, effects of mutations. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Broichhagen, M Schönberger, et al., Nat Commun., 2014, 5:5116 [2] K. Herbst, C. Coltharp, , et al., Chem Biol,. 2011, 18:2, 243-51 [3] D. Trivella, L. Bleicher, et al., J. Struct. Biol., 2010,170, 522–531 19 LINCS Molecular Signatures as a Resource for Personalized Precision Medicine Jarek Meller1, ∗ 1 Cincinnati Children’s Hospital Medical Center, Cincinnati, USA Biomedicine is increasingly becoming data driven and data centric. Multiple concerted efforts are under way to collect massive amounts of biomedical data, including those that pertain to personalized precision medicine. The LINCS consortium that stems from the Connectivity Map and aims to generate a comprehensive library of network-based molecular signatures of chemical (∼30,000 small drug-like molecules) and genetic (∼20,000 gene knockdowns) perturbations in a number of cell lines (∼100-1,000 different cell lines) is a prime example of such an effort. Potential applications of LINCS library in the context of mechanistic studies as well as personalized precision interventions are discussed, with a special emphasis on proteomic profiles and piLINCS resource that we developed recently as part of the LINCS/BD2K Data Integration and Coordination Center. ∗ To whom the correspondence should be addressed: [email protected] 20 Posters Application of new functionalities of RNAComposer in order to improve prediction accuracy Maciej Antczak,1, 2, ∗ Mariusz Popenda,2, 3 Tomasz Zok,1, 2 Joanna Sarzynska,2, 3 Tomasz Ratajczak,1, 2 Ryszard W. Adamiak,1, 2, 3 and Marta Szachniuk1, 2, 3 1 Institute of Computing Science, Poznan University of Technology, Poznan, Poland 2 European Center for Bioinformatics and Genomics, Poznan University of Technology, Poznan, Poland 3 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland The roles which RNA molecules play in many cellular processes are strongly dependent on their tertiary structure. In contrast to the protein field, a much smaller number of RNA 3D structures has been assessed by experimental methods and deposited in structural data banks. Therefore, computational methods designed for tertiary structure prediction of RNAs, are of great importance. In recent years over a dozen methods devoted to RNA 3D structure prediction have been developed [1]. Among them, RNAComposer, a fully automated system for 3D structure prediction of large RNAs has been introduced by our team [2]. Since its inception in 2012, RNAComposer remains one of the most popular resources continually applied for scientific and academic purposes. However, an in-depth analysis of 3D models generated by RNAComposer reveals that in a number of cases the accuracy of prediction might be improved [3]. Thus, we have extended the functionality of our system to allow the user to apply own structural elements in the prediction process and influence the search through the database of available RNA 3D structure elements. Moreover, we have also incorporated three new in silico methods which leads to a greater diversity of resultant RNA 3D models. Introduced functionality contributes to a significant improvement of the predicted 3D model reliability which was observed as a result of the application of RNAComposer for the modelling of 3D structures of precursors of miR160 family members. This work was supported by grants from National Science Center, Poland [2012/05/B/ST6/03026, 2012/06/A/ST6/00384]. ∗ To whom the correspondence should be addressed: [email protected] [1] D. Dufour, M.A. Marti-Renom, Software for predicting the 3D structure of RNA molecules, Wiley Interdisciplinary Reviews: Computational Molecular Science, 2015, 5, 56-61. [2] M. Popenda, M. Szachniuk, M. Antczak, K.J. Purzycka, P. Lukasiak, N. Bartol, J. Blazewicz, R. W. Adamiak, Automated 3D structure composition for large RNAs, Nucleic Acids Research, 2012, 40(14), e112. [3] Z. Miao, R.W. Adamiak, M.-F. Blanchet, M. Boniecki, J.M. Bujnicki, S.-J. Chen, C. Cheng, G. Chojnowski, F.-C. Chou, P. Cordero, J.A. Cruz, A. Ferre-D’Amare, R. Das, F. Ding, N.V. Dokholyan, S. Dunin-Horkawicz, W. Kladwang, A. Krokhotin, G. Lach, M. Magnus, F. Major, T.H. Mann, B. Masquida, D. Matelska, M. Meyer, A. Peselis, M. Popenda, K.J. Purzycka, A. Serganov, J. Stasiewicz, M. Szachniuk, A. Tandon, S. Tian, J. Wang, Y. Xiao, X. Xu, J. Zhang, P. Zhao, T. Zok, E. Westhof, RNA-Puzzles Round II: Assessment of RNA structure prediction programs applied to three large RNA structures, RNA, 2015, 21, 1-19. 22 Modeling inhibitory mechanisms in biological systems using Petri nets Paulina Boguslawska1, ∗ and Piotr Formanowicz1, 2 1 Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland 2 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznań, Poland Petri nets have been described for the first time in 1962 by Carl Adam Petri [4]. Due to their properties they were widely used to model and analyse technical systems (c.f. [5]). In 1952 British neurophysiologists Alan Lloyd Hodgkin and Andrew Fielding Huxley were among the first scientists who use a systems approach to explain a biological process. They have developed a mathematical model aimed at clearing up the action potential disseminate along the axon of neuronal cell [2]. Currently, Petri nets are increasingly being used to create models of biological systems [3]. An analysis of such models is based primarily on t-invariants, which correspond to some subprocesses occuring in the modeled system . In the set of t-invariants similarities are looked for and on the basis of them some unknown properties of the system can be deduced. Here, methods based on clustering and MCT sets (maximal common transition sets) are used [1]. In biology some of processes contains feedbacks and inhibitory mechanisms. Concerned about the accuracy of the modeling of biological processes, it is important to properly describe both of the mechanisms in the models. Unfortunately, the use of inhibitory arcs makes not possible the use of the analysis methods based on t-invariants. Therefore, it is necessary to develop methods of modeling inhibitory mechanisms based on only the basic components of Petri nets. In this work some ideas for solving this problem are presented. ∗ To whom the correspondence should be addressed: [email protected] [1] D. Formanowicz, P. Formanowicz, T. Glowacki, A. Kozak, M. Radom, Hemojuvelin– hepcidin axis modeled and analyzed using Petri nets, Journal of Biomedical Informatics, 2013, 46, 1030-1043. [2] A. L. Hodgkin, A. F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of Physiology, 1952, 117(4), 500-544. [3] Koch I, Reisig W, Schreiber F, editors. Modeling in systems biology: the Petri net approach. London: Springer; 2011. [4] C. A. Petri, Kommunikation mit automaten, Institut fur Instrumentelle Mathematik, Bonn, 1962 [5] J.-M. Proth, X. Xie, Petri Nets: A Tool for Design and Management of Manufacturing Systems, John Wiley & Sons, Inc., 1997. 23 AmyloGram: n-gram analysis and prediction of amyloids Michal Burdukiewicz,1, ∗ Piotr Sobczyk,2 Pawel Mackiewicz,1 and Malgorzata Kotulska3 1 Department of Genomics, Faculty of Biotechnology, University of Wroclaw 2 Faculty of Pure and Applied Mathematics, Wroclaw University of Science and Technology 3 Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer’s, Creutzfeldt-Jakob’s and Huntington’s diseases). Despite their diversity, all amyloid proteins can undergo aggregation initiated by 6- to 15-residue segments called hot spots. Henceforth, amyloids form unique and often zipper-like β-structures, which can turn out harmful [1]. To find patterns defining the hot-spots, we analyzed n-grams (continuous or discontinuous sequences of n elements) extracted from amyloidogenic and non-amyloidogenic peptides collected in the AmyLoad database [2]. Using 1- to 3-grams with gaps as the input data, we trained random forests as predictors of amyloidogenicity. The results were validated in the cross-validation procedure, using predictors trained on data sets with peptides of different lengths. The classification efficiency evaluated as specificity was the best for learners trained on the shortest sequences, whereas predictors based on the longest sequences in the training dataset characterized by greater sensitivity. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on more general properties of amino acids in the sequence, we constructed 524 284 reduced amino acid alphabets of different lengths (three to six letters) based on all possible combinations of the handpicked physicochemical properties of the amino acids. The cross-validation of predictors employing the different alphabets revealed the best-performing alphabet with the length of 6 amino acid residues. It outperformed also predictors relying on the alphabet based on all 20 amino acids. The reduced alphabet is based on four main properties describing amyloidogenicity: hydrophobicity, average flexibility indices, polarizability and thermodynamic β-sheet propensity. We designed a distance measure to compare differences between the alphabets. The correlation between the distance from the best-performing reduced amino acid alphabet and the obtained AUC value was negative (-0.44) and significant (p-value smaller than 2.2e-16), which supports our results that the reduced amino acid alphabets lead to more accurate models of peptides involved in amyloidogenicity. Since n-grams create very large feature spaces, we developed the Quick Permutation Test (QuiPT) for the selection of the most informative attributes. We found 65 n-grams that are the most relevant to the discrimination of amyloid and non-amyloid sequences. The aliphatic and hydrophobic amino acids (isoleucine, leucine, valine) commonly occur in n-grams associated with amyloidogenicity, while their aromatic counterparts (phenylalanine, tryptophan and tyrosine) are less frequent. Predictors with the best performance using the alphabets based on all and reduced amino acids were benchmarked against the most popular tools for amyloid peptides detection using an external dataset. All forests learned on n-grams outperformed the existing software but only the predictor based on the best-performing reduced amino acid alphabet, AmyloGram, has obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). AmyloGram is available as a web-server: www.smorfland.uni.wroc.pl/amylogram/. ∗ To whom the correspondence should be addressed: [email protected] [1] Sawaya, M.R., Sambashivan, S., Nelson, R., Ivanova, M.I., Sievers, S.A., Apostol, M.I., Thompson, M.J., Balbirnie, M., Wiltzius, J.J.W., McFarlane, H.T., Nature, 2007, 447, 453–457. [2] Wozniak, P.P., and Kotulska, M., Bioinformatics, 2015, 31, 3395–3397. 24 Selected aspects of the participation of tobacco smoke in the development of atherosclerosis modeled and analyzed using Petri nets Kaja Chmielewska,1, ∗ Dorota Formanowicz,2 and Piotr Formanowicz1, 3 1 Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland 2 Department of Clinical Biochemistry and Laboratory Medicine, Poznan University of Medical Sciences, Grunwaldzka 6, 60-780 Poznań, Poland 3 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznań, Poland Endothelial dysfunction is induced by a various factors, some of them are: high blood pressure, high glucose level (in diabetes), high low-density lipoprotein level (LDL) and smoking in the wider sense of the term. For a better understanding of the influence of cigarette smoking on endothelial damage, stimulating inflammation and prothrombotic states a model of this complex biological process has been presented. The proposed model has been built using Petri nets [3]. This Petri net structure includes indirect effect of cigarette smoke on impairment of endothelial function, caused by among others changing the lipid profile (it leads to increase the amount of LDL and then to the LDL oxidation, which in result promotes atherosclerosis), decrease the amount of tetrahydrobiopterin (BH4) (it has influence on inhibition of nitric oxide (NO) synthesis, because BH4 is a cofactor for endothelial nitric oxide synthase, which is necessary for NO synthesis), increase the amount of many other important factors, which influence on the development of unwanted processes (oxidative stress, oxidation of LDL, proliferation of vascular smooth muscle cells) [1]. In addition, smoking stimulates inflammation and prothrombotic states, which lead to the development of atherosclerosis. For this reason, the model includes the harmful effects of macrophages (which stimulates development of plaque) and dual role of NO (which is dependent on the amount of this molecule). The analysis of the proposed model has been based mainly on t-invariants. To determine the biological sense of the model, analysis of cluster of t-invariants has been performed (c.f. [2]). This research has been partially supported by the Polish National Science Centre grant No. 2012/07/B/ST6/01537. ∗ To whom the correspondence should be addressed: [email protected] [1] J.A. Ambrose, R.S. Barua, The pathophysiology of cigarette smoking and cardiovascular disease: An update, Journal of the American College of Cardiology, 2004, 43, 1731-1737. [2] D. Formanowicz, P. Formanowicz, T. Glowacki, A. Kozak, M. Radom, Hemojuvelin– hepcidin axis modeled and analyzed using Petri nets, Journal of Biomedical Informatics, 2013, 46, 1030-1043. [3] T. Murata, Petri Nets: Properties, Analysis and Applications. Proceedings of the IEEE, 1989, 77, 541580. 25 CABS-dock web server for flexible docking of peptides to proteins Maciej Pawel Ciemny,1, ∗ Mateusz Kurcinski,2 Maciej Blaszczyk,2 Andrzej Kolinski,2 and Sebastian Kmiecik2 1 Faculty of Physics, University of Warsaw, ul. Pasteura 5, 02-093 Warszawa, Poland 2 Faculty of Chemistry, University of Warsaw, ul. Pasteura 1, 02-093 Warszawa, Poland Protein–peptide interactions play essential functional roles in living organisms and their structural characterization is a hot subject of current experimental and theoretical research. Proteinpeptide molecular docking is a difficult modeling problem, especially if large-scale conformational changes of the receptor are involved. In this work, we present blind-docking results for proteinpeptide interaction obtained using our CABS-dock method [1–3], which allows for large-scale conformational changes during the on-the-fly docking. While most of the other algorithms require pre-defined localization of the binding site, CABS-dock does not require such knowledge. Given a protein receptor structure and a peptide sequence (and starting from random conformations and positions of the peptide), CABS-dock performs simulation search for the binding site allowing for full flexibility of the peptide and small fluctuations of the receptor backbone. We present example CABS-dock results obtained in the default CABS-dock mode and using its advanced options that enable the user to increase the range of flexibility for chosen receptor fragments or to exclude user-selected binding modes from docking search. CABS-do ck web server is available from http://biocomp.chem.uw.edu.pl/CABSdock/. ∗ To whom the correspondence should be addressed: [email protected] [1] Kurcinski, M., M. Jamroz, M. Blaszczyk, A. Kolinski, and S. Kmiecik, CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res., 2015, 43(W1): p. W419-24. [2] Blaszczyk, M., M. Kurcinski, M. Kouza, L. Wieteska, A. Debinski, A. Kolinski, and S. Kmiecik, Modeling of protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking. Methods, 2016, 93: p. 72-83. [3] Kurcinski, M., A. Kolinski, and S. Kmiecik, Mechanism of Folding and Binding of an Intrinsically Disordered Protein As Revealed by ab Initio Simulations. J. Chem. Theory Comput., 2014, 10(6): p. 2224-2231. 26 Computing maximal cliques on GPU – application to structural alignments Pawel Daniluk,1, 2, 3, ∗ Tymoteusz Oleniecki,1, 2, 3, † and Grzegorz Firlik1, 2, 3 1 Department of Biophysics, Faculty of Physics, University of Warsaw, Żwirki i Wigury 93, 02-089 Warsaw, Poland 2 Bioinformatics Laboratory, Mossakowski Medical Research Centre, Polish Academy of Sciences, Pawińskiego 5, 02-106 Warsaw, Poland 3 College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Żwirki i Wigury 93, 02-089 Warsaw, Poland Computing maximal cliques is a well known NP-complete problem. There are several heuristics which can be used to solve it, however most are difficult to parallelize in order to be implemented on GPUs. In this work we present a heuristic based on the Motzkin-Straus theorem [1], which relates maximal clique problem to finding the maximum of a certain quadratic form. This heuristic has been proven effective in several studies (e.g. [2]), but an efficient parallel implementation for graphics processors (GPUs) has not been demonstrated yet [3]. In the presented approach an iterative procedure based on replicator equations is used to seek local maxima. Optimizations which prevent converging to local maxima (corresponding to suboptimal cliques) or spurious solutions (not corresponding to any clique) are also proposed. The procedure has been implemented in CUDA C. Apart from the usual data parallelism typical to GPU programming, we have implemented pipelining in order to ensure that the most computationally intensive task (i.e. vector-matrix multiplication) is not interrupted by less significant tasks (such as vector normalization or node elimination). Altogether we were able to achieve speedup of 24 times over the CPU solution for random graphs having 32 thousand nodes. We present the proof-of-concept application of the Motzkin-Straus clique finding heuristic to computing structural alignments of proteins. Local similarities are detected using local descriptors of protein structure [4]. The alignment is built by finding the largest set of non contradicting local alignments, which are represented as cliques in a graph. The size of a clique directly corresponds to the size of the alignment (number of aligned residues or long distance contacts). This method is an improvement of our DEDAL algorithm [5]. These studies were supported by the research grant (DEC-2011/03/D/NZ2/02004) of the National Science Centre. ∗ † [1] [2] [3] [4] [5] To whom the correspondence should be addressed: [email protected] Presenting author: [email protected] T. S. Motzkin, E. G. Straus., Maxima for graphs and a new proof of a theorem of Turán, Canad. J. Math, 1965, 17.4, 533–540. I. M. Bomze, M. Budinich, M. Pelillo, C. Rossi, Annealed replication: A new heuristic for the maximum clique problem, Discrete Applied Mathematics, 2002, 121(1-3), 27–49. R. Cruz, N. Lopez, C. Trefftz, Parallelizing a Heuristic for the Maximum Clique Problem on GPUs and Clusters of Workstations, IEEE International Conference on Electro-Information Technology, EIT 2013, 2013. P. Daniluk, B. Lesyng, Theoretical and Computational Aspects of Protein Structural Alignment In A. Liwo (Ed.), Computational Methods to Study the Structure and Dynamics of Biomolecules and Biomolecular Processes Springer Berlin / Heidelberg, 2014, pp. 557–598. P. Daniluk, B. Lesyng., A novel method to compare protein structures using local descriptors, BMC Bioinformatics, 2011, 12(1), 344. 27 Epigenetic modifications in RNA: Molecular dynamics studies Indrajit Deb,1, 2, ∗ Joanna Sarzynska,1 and Ryszard Kierzek1 1 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland 2 Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta, Kolkata 700009, West Bengal, India FIG. 1. (A) RNA duplex with PSU and (B) RNA hairpin derived from the m6A-switch in MALAT1 with m6A modifications. Epigenetic mechanisms of gene regulations, apart from chemical modifications to DNA, include also modifications to messenger RNA (mRNA) molecules. These modifications are dynamically/reversibly regulated by specific enzymes. The most abundant and epigenetically important internal modification to RNA is the pseudouridine (PSU) (Figure 1A) and N6-methyladenosine (m6A) (Figure 1B) [1, 2]. There is a plenty of scope to investigate theoretically and validate experimentally the structural consequences of these naturally occurring post-transcriptional modifications to RNA. To deal with this kind of problem, one needs an accurate set of force field parameters for both the standard and modified residues. The currently available force field parameters for modified RNA residues [3] in AMBER molecular modeling package show significant deviations in conformational properties from experimental observations [4]. The examination of the transferability of the recently revised torsion parameters revealed that there was an overall improvement in the conformational properties for some of the modifications but the improvements were still insufficient in describing the sugar pucker preferences [4]. Here, we report an approach for the development and fine tuning of the AMBER force field parameters for modified RNAs. The χ torsion parameters were reparameterized at the individual nucleoside level. The effect of combining the revised γ torsion parameter [5] and modifying the Lennard-Jones σ parameters were also tested by directly comparing the conformational preferences obtained from our extensive molecular dynamics simulations with those from experimental observations [6]. ∗ [1] [2] [3] [4] [5] [6] To whom the correspondence should be addressed: [email protected] S. Schwartz, D. A. Bernstein, M. R. Mumbach et al., Cell, 2014, 159, 148-162. N. Liu, Q. Dai, G. Zheng et al., Nature, 2015, 518, 560-564. R. Aduri, B. T. Psciuk, P. Saro et al., J. Chem. Theory Comput., 2007, 3, 1464-1475. I. Deb, J. Sarzynska, L. Nilsson et al., J. Chem. Inf. Model., 2014, 54, 1129-1142. A. Perez, I. Marchan, D. Svozil et al., Biophys. J., 2007, 92, 3817–3829. I. Deb, R. Pal, J. Sarzynska et al., J. Comput. Chem. 2016, DOI:10.1002/jcc/24374. 28 Computational study on hemagglutinin antigenic sites of influenza virus type A Rafal Filip1, ∗ and Jacek Leluk1 1 University of Zielona Góra, Faculty of Biological Sciences, Department of Biological Sciences, Zielona Góra, Poland Hemagglutinin (HA) is a glycoprotein located on the surface of influenza virions. The function of HA is attachment into infecting host cells, which is one of the key processes in the influenza virus A replication cycle [1]. HA is also a target for host immune system [2]. Each of the three HA subunits has five epitopes on the surface described in the literature as Sa, Sb, Ca1, Ca2 i Cb [3, 4]. These antigenic sites as well as entire HA protein are characterized by high level of variability. This HA’s high mutational rate cause seasonal epidemic and pandemic because the host organism must obtain a new humoral immunity [5]. In this study we investigated the analysis of the HA antigenic sites by finding correlated mutations by study of the aligned sequences of hemagglutinins. The original software (Corm) was used to identify and locate the clusters of correlated positions [6]. There were found several interrelationships between the positions which were not been reported previously. These data are potentially significant for the explanation concerning the characteristics of the amino acid mutations in the antigenic sites. ∗ To whom the correspondence should be addressed: [email protected] [1] D. C. Wiley, J. J. Skehel, The Structure and Function of the Hemagglutinin Membrane Glycoprotein of Influenza Virus. Annual Review of Biochemistry, 1987, Volume 56, 365-394. [2] J. J. Skehel, D. C. Wiley, Receptor Binding and Membrane Fusion in Virus Entry: The Influenza Hemagglutinin. Annual Review of Biochemistry, 2000, Volume 69, 531-569. [3] S. M. Luoh, M. W. McGregor, V. S. Hinshaw, Hemagglutinin mutations related to antigenic variation in H1 swine influenza viruses. Journal of Virology, 1992, Volume 66, 1066-1073. [4] M. Igarashi, K. Ito, R. Yoshida, D. Tomabechi, H. Kida, A. Takada, Predicting the Antigenic Structure of the Pandemic (H1N1) 2009 Influenza Virus Hemagglutinin. PLoS One, 2010, Volume 5(1), e8553. [5] W. G. Laver, G. M. Air, R. G. Webster, S. J. Smith-Gill, Epitopes on protein antigens: misconceptions and realities. Cell, 2015, Volume 61, 553-556. [6] A. Górecki, J. Leluk, B. Lesyng, Identification and free energy simulations of correlated mutations in proteins, RECOMB2005, Cambridge MA, USA, Abstracts, 2005. 29 New approach to de-novo genome assembly using string graph fork detection technique and longest contig path scaffolding method Wojciech Frohmberg,1, ∗ Michal Kierzynka,1 Piotr Żurkowski,1 Pawel Wojciechowski,1 and Jacek Blażewicz1 1 Poznan University of Technology When creating a method for genome assembly there is a considerable temptation to simplify the problem and forget about the global string graph structure to make use of only a local information provided by the reads produced by sequencer. After first preprocessing of the data the naive method could be ready to perform greedy traverse to yield as long reads sequence as possible that would fulfill the N50 [1] measure criterion of the assembly quality. Consensus alignment of such a reads sequence probably would not even match to the genome from which the reads was taken from. This is because of repeating genome fragments which make the string graph structure full of cycles and parallel paths. FIG. 1. Complex structure of the genome string graph [2] This reasoning brought us to consider creating the method that would find only the sure reads sequences that result in contigs that will undoubtedly match the genome the reads were taken from and then run further analysis to answer if they could be join without a quality loss. To solve the problem of finding error-free contigs we proposed methods discovering string graph forks. The poster is to gather tests comparing our methods to the most commonly used de-novo genome assemblers. It is also to state a differences of the algorithm at each of its step. ∗ To whom the correspondence should be addressed: [email protected] [1] J.R. Miller, S. Koren, G. Sutton, Assembly algorithms for next-generation sequencing data, Genomics 95, 315–327, 2010. [2] C. T. Skennerton, M. Imelfort, and G. W. Tyson, Crass: identification and reconstruction of CRISPR from unassembled metagenomic data, Nucleic Acids Research, p. 183, 2013. 30 Predicting the age status of somatic mutations by Gaussian mixture decomposition Mateusz Garbulowski1, 2, ∗ and Andrzej Polański1 1 Institute of Computer Science, Silesian University of Technology, Gliwice, Poland 2 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland Evolution of cancer tissues is explained by the clonal theory. Therefore, estimation of the clonal structure of cancer cells populations allows for predicting tumor growth properties and its responses to therapies. Next generation sequencing (NGS) techniques applied to tumor tissues allows for developing new scenarios for researching evolution of cancer tissues. Reading DNA sequences of tumor and normal cells and their comparisons allows for discovering sets of somatic driver mutations, which are related to tumor clonal growth. Numerous studies devoted to development of methods of analysis of NGS cancer genomics data are appearing in the literature. In the presented study we show a methodology of analysis of whole exome sequencing (WES) data oriented towards discovery of driver mutations and estimation of their age status, early—clonal and late—subclonal. We accept the hypothesis [1], that multiple passenger mutations accumulate in a cancer cell before a driver mutation causes a clonal expansion. We have designed a system of WES data analysis of glioblastoma multiforme (GBM) cancer cells, leading to discovering a lists of driver and passenger somatic mutations. We decompose the obtained histograms of variant allele frequencies (VAF) into Gaussian mixture of probability distributions and we estimate the age status of driver and passenger mutations, measured from the time of tumor origination, by the scaled weights of Gaussian components. We also show that proportion of amount of subclonal and clonal somatic mutations is a strong predictor for patients survivals. ∗ To whom the correspondence should be addressed: [email protected] [1] A. Sottoriva et al., Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics, Proceedings of the National Academy of Sciences, 2013, 110, 4009–4014 31 Complete Petri Net Study of Medically Relevant Interactions in the Immune System Model Anna Gogolinska1, 2, ∗ and Wieslaw Nowak2 1 2 Faculty of Mathematics and Computer Science in Toruń, Poland Institute of Physics, Nicolaus Copernicus University in Toruń, Poland The immune system (IS) is our defense against pathogens [1]. The immune response is very complex and it can be divided into many parts and types. The complete study of the whole immune response, or even its big part is very difficult and requires appropriate methods. Mathematical models of IS are sought in current systems biology studies. Hopefully such modeling may be useful in medicine as well. Petri nets (PNs) [2] are one of the mathematical modeling languages. PN has a form of a bipartite graph with two kinds of nodes: places and transitions. Transitions can represent actions, events, and places are suitable for describing objects, elements and they can also contain tokens. Dynamics in PNs is obtained by transferring tokens by transitions from their input places to the output places. The PN idea is very simple and very flexible and universal at the same time. We have used PNs to create a model of human IS [3]. The model represents a large part of the immune response and it consists of five parts: reaction of DC cell, Th lymphocytes activation, cellular response, humoral response and the reaction of macrophages. The model is rather comprehensive – it consist of about 100 places and 100 transitions. To our best knowledge it is the most complex and complete model of the IS up to date. Furthermore the PN formalism allows to modify previously created Petri nets easily. The model was used to study different pathological : fever, Autism Spectrum Disorder, HIV infection, Adult-Onset Immunodeficiency Syndrome (AOIS), ageing of the IS. Those phenomena were added in different ways, appropriate to the type of induced changes in the IS, for example by adding a new part of the network, creating new transitions or modifying the weights in PNs. For every studied case PN simulations were performed and the behavior of the model was monitored. Results of time evolutions of some components of IS will be presented in the poster as selected plots. The whole IS model will be also outlined. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Parkin and B. Cohen, ”An overview of the immune system,” The Lancet, vol. 357, pp. 1777–1789, 2001. [2] W. Reisig, Understanding Petri Nets: Springer, 2013. [3] A. Gogolinska and W. Nowak, ”Petri Nets Approach to Modeling of Immune System and Autism,” in Artificial Immune Systems. vol. 7597, C. Coello Coello, et al., Eds., ed: Springer Berlin / Heidelberg, 2012, pp. 86–99. 32 Modification of the accessibility of the human soluble epoxide hydrolase active site, in silico study Sandra Goldowska,1, ∗ Karolina Markowska,1 Alicja Pluciennik,1 and Artur Góra1 1 Tunneling Group, Biotechnology Centre, Silesian University of Technology, ul. Krzywoustego 8, 44-100 Gliwice, Poland Gate can reversibly control the access of various molecules transported by tunnels to and from the active site of enzyme. The open or closed conformation of a gate might be stabilized by anchoring residues. This mechanism can be crucial for enzymes selectivity or/and specificity, and regulate the rate-determining step of catalysis [1]. The aim of this study was to: i) identify gates and anchoring residues in human soluble epoxide hydrolase; ii) propose and investigate changes which can modify the access to the active site of the selected enzyme. Active site of human soluble epoxide hydrolase is buried inside protein structure and connected with the environment by tunnels. Epoxide hydrolases catalyze the conversion of epoxides to their corresponding diols by water addition. Those enzymes play an important role in proper functioning of organisms because they are involved in drug metabolism and detoxification of xenobiotics [2]. To achieve proposed goals and investigate the capabilities of the control of the tunnel dynamics an in silico study of human soluble epoxide hydrolase was performed. The Amber14 package was used to run and analyse 50 ns molecular dynamics simulation [3]. Caver 3.02 software was used for tunnels identification [4]. Gating and anchoring residues detection was performed based on analysis of amino acids conformations changes. The results of the analysis identified Phe497 residue working as a gate, which modifies throughput and opening of main tunnels providing access to the active site and His524 building active site as an anchoring residue. Rational mutants design aiming modification of the access/exit controlling system was implemented based on the results from in silico study and analysis of the amino acids conservativeness at preselected positions. Proposed mutations were introduced by FoldX and compilation of MD simulations and CAVER methods was used to analyse changes in designed variants. The work is supported by a grant SONATA-BIS 2013/10/E/NZ1/00649 financed by the National Science Centre Poland (www.ncn.gov.pl) ∗ To whom the correspondence should be addressed: [email protected] [1] A. Góra, J. Brezovsky, J. Damborsky, Chemical Review, 2013, 113, 5871-5923 [2] R. Thalji, J. McAtee, S. Belyanskaya, M. Brandt, G. Brown, M. Costell, Y. Ding, J. Dodson, S. Eisennagel, R. Fries, J. Gross, M. Harpel, D. Holt, D. Israel, L. Jolivette, D. Krosky, H. Li, Q. Lu, T. Mandichak, T. Roethke, C. Schnackenberg, B. Schwartz, L. Shewchuk, W. Xie, D. Behm, S. Douglas, A. Shaw, J. Marino, Bioorganic & Medicinal Chemistry Letters, 2013, 23, 3584-3588 [3] D.A. Case, V. Babin, J.T. Berryman, R.M. Betz, Q. Cai, D.S. Cerutti, T.E. Cheatham III, T.A. Darden, R.E. Duke, H. Gohlke, A.W. Goetz, S. Gusarov, N. Homeyer, P. Janowski, J. Kaus, I. Kolossvary, A. Kovalenko, T.S. Lee, S. LeGrand, T. Luchko, R. Luo, B. Madej, K.M. Merz, F. Paesan, D.R. Roe, A. Roitberg, C. Sagui, R. Salomon-Ferrer, G. Seabra, C.L. Simmerling, W. Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, X. Wu, P.A. Kollman, AMBER 14, University of California, San Francisco, 2014 [4] E. Chovancová, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M. Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PloS Computational Biology, 2012, 8(10), e1002708 33 New method of Sholl analysis Judyta Jablońska1, 2, ∗ and Zbigniew Soltys2 1 Zaklad Neuroanatomii, Instytut Zoologii Uniwersytetu Jagiellońskiego 2 Kolo Naukowe Bioinformatyki Uniwersytetu Jagiellońskiego The nervous system consist of neuronal and glial cells, which stand out fromothers cellular forms as they have numerous, more or less branched processes. Neurobiological research uncovered that the complexity and variability of the cellular morphology influence mode of information processing. Changes in the structure of such processes are visible under physiological conditions but also may indicate symptoms of various disorders, from neurogenerative diseases to psychic disturbances. Until now numerous methods for quantitative analysis of neuronal and glial morphology has been developed including the oldest but still most commonly used Sholl analysis. Currently a couple of solutions has been proposed to perform it semi or fully automatically, but each of them has its own limitations. In order to overcome some of such problems, we propose an idea of the simple, open-source, fast and automatic method of Sholl analysis, which would be applied to digital, 3D reconstruction of cell shapes. In addition, this method allows us to define a new parameter describing cell morphology termed ’straightness’. FIG. 1. Sholl analysis is based on calculating the number of intersections against the radial distance from the soma centre and results in the graph called Sholl profile. ∗ To whom the correspondence should be addressed: [email protected] [1] K.E. Binley, W.S. Ng, J.R. Tribble, B. Song and J.E. Morgan (2014) Sholl analysis: a quantitative comparison of semi-automated methods, Journal of Neuroscience Methods 225: 65–70. [2] R.C. Cannon, M.-O. Gewaltig, P. Gleeson, U.S. Bhalla, H. Cornelis, M.L. Hines, F.W. Howell, E. Muller, J.R. Stiles, S. Wils and E. de Schutter (2007) Interoperability of Neuroscience Modeling Software: Current Status and Future Directions, Neuroinform 5: 127–138. [3] T.A. Ferreira, A.V. Blackman, J. Oyrer, S. Jayabal, A.J. Chung, A.J. Watt, P.Jesper Sjöström and van Meyel, Donald J (2014) Neuronal morphometry directly from bitmap images, Nature methods 11: 982–984. [4] J.C. Gensel, D.L. Schonberg, J.K. Alexander, D.M. McTigue and P.G. Popovich (2010) Semi-automated Sholl analysis for quantifying changes in growth and differentiation of neurons and glia, Journal of Neuroscience Methods 190: 71–79. [5] H. Gutierrez and A.M. Davies (2007) A fast and accurate procedure for deriving the Sholl profile in quantitative studies of neuronal morphology, Journal of Neuroscience Methods 163: 24–30. 34 Over-minimization in molecular dynamics simulations R. Jakubowski,1, ∗ J. Rydzewski,1 and W. Nowak1 1 Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland Molecular dynamics (MD) is a routinely used method that allows exploration of conformational landscape of molecules. In case of proteins the protocol which is commonly used includes usually performing of an energy minimization via geometry optimization. However, there are no clear indications how many minimization steps one should choose and the tendency is to set up a high number as most algorithms are computationally cheap these days. In this work we present results of our research on one of the most popular minimization algorithms – the conjugate gradient method (CGM). We describe the meaning of the “overminimization” phrase, which indicates an unwanted growth of entropic contributions values during performing too many steps of CGM calculations. We also provide a new measure based on the Pareto front of total entropy which facilitates choice of the right structure from the minimization process in terms of thermodynamic properties. This work was published in [1]. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Rydzewski, R. Jakubowski, W. Nowak, J. Chem. Phys., 2015, 143, 171103. 35 Influence of the artificial protein nanotube on a cell membrane Patryk Kosiorek1, ∗ and Lukasz Peplowski1, † 1 Theoretical Molecular Biophysics Group; Institute of Physics; Faculty of Physics, Astronomy and Informatics; Nicolaus Copernicus University; Grudziadzka 5, 87-100 Torun, Poland The world of nanotechnology provides us with the information about the qualities of mechanical nanotubes, both carbon and protein and also miscellaneous influence of the cell membrane. We can observe multiple researches about clasping, stretching and piercing of the lipid membranes [1]. The aim of this work is to create a new artificial Protein NanoTube (PNT) model based on the structure present in nature [2, 3], and obtain properties of its interactions with model cell membrane. Homemade script have been used to build PNT. Model formation relies on the fragment duplication method based on PDB structure. Such model have been used to pierce the cell membrane and to take the away embedded PNT out of it. Our simulations shows that PNT can pierce cell membrane with force about 1800 pN, but in the last phase of membrane piercing protein starts to unfold. When embedded PNT is removed from membrane, forces applied are smaller — about 1000 pN. In such simulations protein preserve native conformation. The entire experiment have been performed using the steered molecular dynamics simulations [4], using the NAMD 2.10 code [5] and the CHARMM force field [6]. ∗ † [1] [2] [3] [4] [5] [6] To whom the correspondence should be addressed: [email protected] To whom the correspondence should be addressed: [email protected] R.Garcia-Fandino, J.L.Trick, A.Pineiro, M.S.P.Sansom, ACSNano, 2016, Volume 10, 3693–3701. S.Heten, M.J.Buehler, Volume 197, 2008, Pages 3203–3214. K.Brown, F.Pompeo, S.Dixon, D.Mengin-Lecreulx, Ch.Cambillau, Y.Bourne, 1999, Volume 18, 4096– 4106. H.Lu, B.Isralewitz, A.Krammer, V.Vogel, K.Schulten, 1998, Biophys J., 75, 662–71. J.C.Phillips, R.Braun, W.Wang, J.Gumbart, E.Tajkhorshid, E.Villa, Ch.Chipot, R.D.Skeel, L.Kale, and K. Schulten. J.Comp Chem, 2005, 26:1781–1802. A.D.MacKerell, Jr.,B.Brooks, C.L.Brooks, III, L. Nilsson, B. Roux, Y. Won, and M. Karplus The Encyclopedia of Comp. Chem, 1998, 271-277. 36 Searching for structural patterns in the vicinity of microRNA in plants Joanna A. Kowalska,1, ∗ Katarzyna Tomczyk,1 Agnieszka Mickiewicz,2 Joanna Sarzynska,2 and Marta Szachniuk1, 2 1 Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology 2 Institute of Bioorganic Chemistry PAS Evolutionarily ancient, about 19-24nt molecule of microRNA (miRNA) has been attracting scientists over the last few decades. So far, it has been recognized as a regulator of gene expression on post-transcriptional level. In plants, miRNA plays a key role in the response to stress conditions, such as sudden changes of temperature, drought or nutrient deficiency, and it participates in the process of growth and development. Biogenesis of both plant and animal miRNA starts in nucleus. Hairpin loop structure of animal pre-microRNA is exported to cytoplasm and there it is cleaved to the form of miRNA-miRNA* duplex. In contrary, plant miRNA is exported to cytoplasm after creating the duplex. The other significant difference between miRNA in animals and plants are enzymes mediating the process of miRNA maturation. Dicer, animal RNase III enzyme, serves as ’molecular ruler’ in animals. After recognition the 5’ end and/or the 2-nt 3’ overhang, Dicer measures the distance of 22nt and performs a cleavage of the duplex [1]. Dicer Like 1 (DCL1) being a homologue of Dicer is responsible for cutting out the miRNA-miRNA* duplex in plants. The mechanism of action of the latter enzyme still remains unclear. Preliminary analysis of sequence, secondary and tertiary structures could help discover how plant microRNAs are recognized by an enzyme within their precursors. Herein, we present a bioinformatics approach aimed to support an understanding of plant microRNA biogenesis. By using the set of available bioinformatics tools, i.a. WebLOGO [2], RNAstructure [3], MCQ4Structures [4], RNAComposer [5], Swiss PDB Viewer [6], and own scripts, we try to identify structural patterns in the vicinity of miRNA in plants. In search for patterns we consider the first, the secondary and the tertiary structures of available pre-miRNAs, focusing on the neighborhood of miRNA-miRNA* duplex. We present potential motifs identified in sequence and secondary structure. We also demonstrate the first results of tertiary structure analysis based on predicted 3D models of miRNA precursors. ∗ To whom the correspondence should be addressed: [email protected] [1] Ha M, Kim N, Regulation of microRNA biogenesis, Nature Reviews Molecular Cell Biology, 2014, 15, 509–524, [2] Crooks GE, Hon G, Chandonia JM, Brenner SE, WebLogo: A sequence logo generator, Genome Research, 2004,14, 1188-1190, [3] Reuter J S, Mathews D H, RNAstructure: software for RNA secondary structure prediction and analysis, BMC Bioinformatics, 2010, 11, [4] Zok T, Popenda M, Szachniuk M, MCQ4Structures to compute similarity of molecule structures, Central European Journal of Operations Research, 2014, 22, 457-473, [5] Popenda M, Szachniuk M, Antczak M, Purzycka K J, Lukasiak P, Bartol N, Blazewicz J, Adamiak R W, Automated 3D structure composition for large RNAs, Nucleic Acids Research, 2012, 14, [6] Guex N, Peitsch MC, SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling, Electrophoresis 18, 1997, 15, 2714-2723. 37 PyRosetta energy terms as indicators for protein mirror models Monika Kurczyńska,1, ∗ Bogumil M. Konopka,1 and Malgorzata Kotulska1 1 Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland The number of the protein sequences in May 2016 was 64,000,000, which was 21-times higher than 10 years ago [1]. In the same time the number of the protein structures in the Protein Data Bank [2] increased only 3-fold and nowadays equals 110,000. To decrease the disparity between the primary and tertiary protein structures the protein structure modelling methods are developing. Tools for protein structure reconstruction from a contact map generate a model collection containing properly oriented models and mirror models, because all of them share the same contact map. Properly oriented protein models and mirror models can constitute competitive forms in nature. Our main goal is to identify the indicators which could be useful in distinction the mirror models without a priori knowledge about the structure. We assumed that some of the PyRosetta energy terms will be significantly different for mirror models than for properly oriented models. In our work we used protein models which were reconstructed from contact maps of experimental SCOP domains [3] with our tool - C2S pipeline [4]. The original SCOP domains are organized in classes based on the similarity of their secondary structures. We investigated 100 models for each of 1305 domains. With Biopython [5] we calculated structural features of the models and with PyRosetta [6] we computed the energy terms, whose linear combination is the total energy of the model. C2S pipilne generates mirror models and properly oriented models with the same probability of 0.5. However, for some domains the percentage of mirror models was lower than 5% or higher than 95%. The structural quality of the properly oriented models and mirror models is comparable. The mean RMSD of the properly oriented models compared, to the original SCOP structures, equaled 5.6 Å with standard deviation of 5.4 Å , while the mean RMSD mirror models compared to the ideal mirror images of the original SCOP structure was 5.6 Å with standard deviation of 5.5 Å. In all-alpha domains the energy term which describes electrostatic energy (hack elec) offered the most reliable indicator between properly oriented and mirror models (for 77% domains). Simultaneously, the energy terms related to the probability of amino acid at dihedral angles Ψ and Θ (p aa pp) and with the Ramachandran preferences (rama) were statistically different for 68% and 64% domains. Despite the intuition that the mirror images of the protein riches in alpha-helices are easier to identify, we observed more energy terms which were significantly different for more than 75% domains. These energy terms were also rama and p aa p, additionally the attractive and repulsive portions of the Lennard-Jones potential (fa atr, fa rep) and Lazaridis-Karplus solvation energy (fa sol). ∗ To whom the correspondence should be addressed: [email protected] [1] The UniProt Consortium, Nucleic Acids Res, 2015, 43, D204-D212. [2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, Nucleic Acids Res, 2000, 28, 235-242. [3] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, J Mol Biol, 1995, 247, 536–540. [4] B.M. Konopka, M. Ciombor, M. Kurczynska, M. Kotulska, J Membr Biol, 2014, 247, 409–420. [5] P.J. Cock, T, Antao, J.T. Chang, B.A. Chapman, C.J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M.J. de Hoon, Bioinformatics, 2009, 25, 1422–1423. [6] S. Chaudhury, S. Lyskov, J.J. Gray, Bioinformatics, 2010, 26, 689–691. 38 Application of molecular docking for predicting protein substrate specificity Michal Laźniewski,1, 2, ∗ Krzysztof Kuchta,1 Dariusz Plewczyński,3 and Krzysztof Ginalski1 1 Laboratory of Bioinformatics and Systems Biology, Centre of New Technologies, University of Warsaw, Poland 2 Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, Poland 3 Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Poland In recent years, the application of next generation sequencing brought a substantial increase in the number of known protein sequences. The Refseq database alone contains now more than 60 million unique sequences, 50 times more than only ten years ago. For such an overwhelming amount of data, functional assignment using only experimental techniques is a challenging task. With structural genomics providing thousands of new protein structures [1], the additional information might be utilized by structure-based methods to improve the quality of bioinformatics predictions. Thus molecular docking might prove invaluable where other theoretical techniques fail to predict the function of an analyzed protein. It has been already shown that molecular docking can frequently predict the correct conformation of a small compound in a protein-ligand complex; however, calculating the binding energy remains a challenge [2]. On the other hand, with most of the efforts focused on studying the interactions between proteins and drug-like inhibitors, there only has been rather limited emphasis put on analyzing the proteins and their in vivo small molecule partners [3]. The goal of our work was a comprehensive analysis of the performance of various molecular docking algorithms in wide-scale predictions of substrate specificity of proteins from a single organism – Escherichia coli. We analyzed all E. coli enzymes for which the crystal structure was solved together with their cognate partner (the substrate or product of the reaction). Specifically, two sets of molecules where docked to each enzymes: (a) compounds present in the active sites of all selected proteins and (b) the entire metabolome of E. coli. The performance of four different programs (GOLD, eHiTs, Surflex and Glide) were tested, including both the ability to identify the enzyme’s cognate ligand and to correctly predict the ligand conformation. Moreover, we discuss if applying machine learning methods could further increase the quality of such structure-based functional predictions. ∗ To whom the correspondence should be addressed: [email protected] [1] J. Weigelt, Exp cell res, 2010, 316(8), 1332-1338. [2] D. Plewczynski, M. Lazniewski, et al., J Comput Chem, 2011, 32(4), 742-755. [3] A. Macchiarulo, I. Nobeli, et al., Nat Biotechnol, 2005, 22(8), 1039-1045. 39 PANTA RHEI: analysis of solvent flow in MD simulations Tomasz Magdziarz,1, ∗ Karolina Markowska,1 Sandra Goldowska,1 and Artur Góra1 1 Tunneling Group, Biotechnology Centre, Silesian University of Technology, 44100 Gliwice, Poland Proper actions of enzyme proteins are enabled by multitude of factors, for example, temperature, pH, or solvent, which is usually constituted by water plus some ions. The solvent is especially an interesting factor because it contributes to catalytic stability, activity and selectivity of the proteins. Natural evolution developed a variety of mechanisms regulating water access to the active site. The division of the hydrophobic and hydrophilic compartments in protein cores can separate processes requiring distinct dielectric conditions. In enzymes with the buried active site, connected with surrounding solvent by tunnels, the water flow can be controlled by molecular properties of amino acids constituting tunnels or in more sophisticated enzymes by gates controlling the opening and closing of the access pathways. This detailed information about the molecules flow through the tunnel network can significantly improve our understanding of mechanisms controlling enzyme activity. In past years several tools for tunnels identification were developed. The most recent like CAVER 3.0 [1] or Mole 2.0 [2] can facilitate analysis of molecular dynamic simulations and allows to gather precise information about the geometry of detected pathways and their prolongation in time. However, the knowledge of geometrical properties of existing tunnels approximated by spherical balls penetrating empty space in proteins can only suggest ways of solvent molecules (and ligands) entry/exits. Parameters like the length of a tunnel, its diameter and even properties of amino acids that build the tunnel do not allow for easy identification of the major factors controlling the flow of water molecules. Here we present a novel tool for analysis of solvent flow in molecular dynamic simulations. AQUEDUCT, software package developed in our group, allows extraction, analysis and visualization of the behavior of solvent molecules during the entire simulation. Enzymes in explicit solvent MD simulations are immersed in a kind of water box counting thousands of water molecules. Analysis of this bulk of water can provide, for example, insight in overall distribution and density of water molecules. AQUEDUCT, on the other hand, has different approach. It traces particular water molecules that enter or interact with the active site. This leads to the complete picture of solvent flow from, to, and around the active site. Moreover it allows for better understanding of factors that control the flow of water molecules and their impacts on enzymes activity. Applications of AQUEDUCT are not limited to water molecules only. It is a universal tool and together with previously described tools can provide a complex description of the protein tunnels network and their accessibility/usage by different molecules. The work is supported by National Science Centre Poland grant SONATA-BIS 2013/10/E/NZ1/00649. ∗ To whom the correspondence should be addressed: [email protected] [1] E. Chovancova, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M. Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PLoS Comput Biol, 2012, 8, e1002708 [2] D. Sehnal, R. Svobodová Vařeková, K. Berka, L. Pravda, V. Navrátilová, P . Banáš, C.-M. Ionescu, M. Otyepka, J. Koča, J Chemoinform, 2013, 5, 1-13 40 NGS-based analysis of copy number variations in various cattle breeds M. Mielczarek,1, 2, ∗ M. Fraszczak,1 E. L. Nicolazzi,3 G. Minozzi,3 H. Schwarzenbacher,4 C. Egger-Danner,4 D. Vicario,5 F. Seefried,6 A. Rossoni,7 T. Solberg,8 L. Varona,9 C. Diaz,9, 10 C. Ferrandi,3 R. Giannico,3 J. L. Williams,11 J. Woolliams,12 and J. Szyda1, 2 1 Biostatistics group, Wroclaw University of Environmental and Life Sciences; Wroclaw, Poland 2 National Research Institute of Animal Production; Cracow-Balice, Poland 3 Fondazione Parco Tecnologico Padano; Lodi, Italy 4 ZuchtData EDV-Dienstleistungen GmbH; Vienna, Austria 5 Italian Simmental Cattle Breeders Association; Udine, Italy 6 Swiss Brown Cattle Breeders Federation; Zug, Switzerland 7 Italian Brown Cattle Breeders‘ Association; Bussoleng, Italy 8 Norwegian University of Life Sciences; As, Norway 9 Universidad de Zaragoza; Zaragoza, Spain 10 Instituto Nacional de Investigaciòn Agropecuaria; Madrid, Spain 11 University of Adelaide; Roseworthy, Australia 12 Roslin BioCentre; Roslin, UK Whole genome DNA sequences were determined for 104 bulls representing Brown Swiss (48 individuals) Fleckvieh (30), Guernsey (20), Simmental (16) and Norwegian Red (23) breeds. The total number of raw reads obtained for a single animal varied between 270,678,710 (a Fleckvieh) and 768,980,700 (a Brown Swiss). The average genome coverage ranged from 10 to 28. Alignment to the UMD3.1 reference genome was carried out using BWA-MEM. CNV calling was performed with the CNVnator software. The number of duplications per individual varied between 2,204 and 48,501, while the number of deletions varied between 9,771 and 24,334. Six deletions located on chromosomes 6, 7, 17 ,21, 23 and 29 were shared among all animals, while there were no duplications shared among all of them. Length of CNVs varied from 200 bp to 999,300 bp for deletions, and from 200 bp to 925,000 bp for duplications. In conclusion a significant variation in genome structure is observed both within as well as among breeds. The research was carried out within the EU ”Gene2Farm” project (7FP grant No. 289592) and Polish National Science Centre grant No. UMO-2014/15/N/NZ9/03914. Processing of the raw data was performed at the Poznan Supercomputing and Networking Center. ∗ To whom the correspondence should be addressed: [email protected] 41 Theoretical study of influence of mutations on superoxide dismutase SOD1 dimer by molecular dynamics simulations Przemyslaw Miszta,1, ∗ Cezary Żekanowski,2 Jakub Fichna,2 Michalina Kosiorek,2 and Slawomir Filipek1 1 Faculty of Chemistry & Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland 2 Department of Neurodegenerative Disorders, Mossakowski Medical Research Centre, Polish Academy of Sciences, Warszawa, Poland The human superoxide dismutase [Cu-Zn] also known as superoxide dismutase 1 or SOD1 plays a very important role in the regulation of apoptosis cells. It is an enzyme that in humans is encoded by the SOD1 gene, located on chromosome 21. Mutations in this gene have been implicated as causes of familial amyotrophic lateral sclerosis (fALS). The mechanism by which mutant SOD1 exerts toxicity remains unknown. SOD1 is an important antioxidant defense in nearly all living cells exposed to oxygen. The structures of human SOD1, involving dimer, which contains four binding sites for metal ions: two for Zn2 + and two for Cu2 +, were taken from the X-Ray structure from the Protein Data Bank (PDB id:2C9V) [1]. The Zn and Cu ion-binding site is built by the following amino acids: His44, His46, His61, His118, His69, His78, Asp81. The force field parameter and charges for the Zn-Cu binding site was created by the analogy to data obtained from previous quantum chemical calculations [2, 3]. All energy minimizations (10 000) and molecular dynamics (MD) simulations were performed in the NAMD program version 2.10 using an all-atom (37 000 atoms) force field CHARMM27 [4] in a periodic box (62x95x62Å) and Langevin (stochastic) dynamics [5]. The following mutations: K3E, A4V, G41S, G72S, N86S, D90A, G93C, S105L, C11Y, N139D, L144S, L126X and the native protein were investigated by molecular dynamics simulations. The study of changing the mutated SOD1 structure during 20 ns all-atom MD simulations in explicit water environment were performed. Comparison of superimposed resulted structures, mutants with WT protein, revealed how each mutation could influence the structure of SOD1 dimer and also how the potential interactions with other proteins could be altered. ∗ To whom the correspondence should be addressed: [email protected] [1] Strange, R.W., Antonyuk, S.V., Hough, M.A., Doucette, P.A., Valentine, J.S., Hasnain, S.S., J.Mol.Biol. 2006, 356, 1152 [2] Shen, J., Wong, C.F., Subramaniam, S., Albright, T.A., and McCammon, J.A., J. Comp. Chem. 1990;11: 346–350 [3] Branco, RJF; Fernandes, PA; Ramos, MJ, J. Phys. Chem. B, 2006, 110, 16754. [4] MacKerell, Jr. AD, Banavali N, Foloppe N, Biopolymers, 2001, 56, 257–265. [5] R. Kubo, M. Toda, N. Hashitsume, Statistical Physics II: Nonequilibrium Statistical Mechanics, Springer, 1991. 42 Transcriptomic analysis of gene expression data from Bos taurus liver using RNA-Seq Pareek C.S.,1, 2, ∗ Walendzik Paulina,1, 2, † Kadarmideen Haja,3 and Kogelman Lisette3 1 Functional Genomics Lab. Faculty of Biology and Environmental Protection, Nicolaus Copernicus University, Torun, Poland 2 Interdisciplinary Centre of Modern Technology, Nicolaus Copernicus University, Torun, Poland 3 The Animal Breeding, Quantitative Genetics and System Biolog Lab., University of Copenhagen, Denmark RNA-Seq is a relatively novel technology that can be used to analyze the changes in gene expression across the entire transcriptome and has been applied to an intense increasing number of organisms [1]. In this study we have used massive, parallel high-throughput transcriptome sequencing (RNA-seq) technologies to characterize the bovine liver transcriptome architecture in three cattle breeds at three developmental ages. Bovine liver tissue is the main organs engaged in the regulation of metabolism. Especially in the overall both muscle and body growth development in young growing bulls. The bioinformatics analysis was performed using the PARTEK Flow suite and R-software with specific packages, to identify significantly differentially expressed genes (SDE) genes and associated over-represented Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways across the whole liver transcriptome of cattle. Included we obtained 455 differentially expressed genes from the RNA-Seq data. Detailed analysis from the Venn diagram in overlapping genes of the Hereford breed between 9-month and 12-months, identified two significant genes correlated with fat metabolism: gene FADS2 (fatty acid desaturase 2) and gene FASN (fatty acid synthase. The significant results from GO term was obtained in just one analysis, namely within the Hereford breed comparing 9 vs 12 months. Detailed GO analysis of Hereford 9 vs 12 months, identified nine biological pathways and five from them are associated with fat metabolism. Similarly, the detailed KEGG analysis of Hereford 9 vs 12 months resulted in identification of nine pathways, of which 4 were associated with fat metabolism. The results also indicate that the comprehensive identification and annotation of unknown transcripts from tissue specific transcriptome analysis using RNA-seq data remains a tremendous future challenge. Supported by the National Science Centre, Krakow, Poland (Project No. 2012/05/B/NZ2/01629). This master thesis during the three-month of traineeship was realized partially at the University of Copenhagen in the Animal Breeding, Quantitative Genetics and System Biology Group under the leadership of Profesor Haja Kadarmideen. ∗ To whom the correspondence should be addressed: [email protected] Presenting author: [email protected] [1] McCabe M., Waters S., Morris D., Kenny D., et al., (2012), RNA-seq analysis of differentia gene expression in liver from lactating dairy cows divergent in negative energy balance, BMC Genom., 20:193. † 43 The evolutionary glance at tunnels in proteins Alicja Pluciennik,1, ∗ Michal Stolarczyk,1 Sandra Goldowska,1 Magdalena Lugowska,1 Tomasz Magdziarz,1 and Artur Góra1 1 Tunneling Group, Biotechnology Center, Silesian University of Technology, ul. Krzywoustego 8, 44-100 Gliwice, Poland Enzymes acquire many strategies of catalysis. One of them is a hidden active site connected to external protein environment via a tunnels. The properties of amino acids which form buried pathway have important role in regulation of ligand passage and binding and in the catalytic properties [1]. Also changes of residue conformation, like in case of gating and anchoring amino acids can considerably influence the substrate or ligand flow to and from the active site [2]. The conservation of amino acids in protein families allows to predict hot spots for protein engineering. The highly conserved amino acids according to Kimura’s neutral theory of molecular evolution play significant role in protein stability and functionality. However, low values of conservation can point residues responsible for adaptation of enzyme to improve its specificity. Therefore the variability or conservation of protein tunnels provides us with insight into the mechanism enzymes selectivity or specifity. Performed analysis aims to link evolutionary history of residues and their function. Thus, the deeper insight into residues evolutionary status, the more possibilities of obtain desired enzyme properties. The work is supported by a grant SONATA-BIS 2013/10/E/NZ1/00649 financed by The National Science Centre Poland (www.ncn.gov.pl). ∗ To whom the correspondence should be addressed: [email protected] [1] L. Biedermannová, Z. Prokop, A. Gora, E. Chovancová, M. Kovács, J. Damborský, and R. C. Wade, A Single Mutation in a Tunnel to the Active Site Changes the Mechanism and Kinetics of Product Release in Haloalkane Dehalogenase LinB, 2012, J. Biol. Chem., vol. 287, no. 34, pp. 29062–29074. [2] A. Gora, J. Brezovsky, and J. Damborsky, Gates of Enzymes, 2013, Chem. Rev., vol. 113, no. 8, pp. 5871–5923. 44 Redundans: an assembly pipeline for highly heterozygous genomes Leszek P. Pryszcz1, 2, ∗ and Toni Gabaldón1, 3, 4 1 Bioinformatics and Genomics Programme. Centre for Genomic Regulation (CRG). Dr. Aiguader, 88. 08003 Barcelona, Spain 2 International Institute of Molecular and Cell Biology, Warsaw, Poland 3 Universitat Pompeu Fabra (UPF). 08003 Barcelona, Spain 4 Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluı́s Companys 23, 08010 Barcelona, Spain Many genomes display high levels of heterozygosity (i.e. presence of different alleles at the same loci in homologous chromosomes), being those of hybrid organisms an extreme such case. The assembly of highly heterozygous genomes from short sequencing reads is a challenging task because it is difficult to accurately recover the different haplotypes. When confronted with highly heterozygous genomes, the standard assembly process tends to collapse homozygous regions and reports heterozygous regions in alternative contigs. The boundaries between homozygous and heterozygous regions result in multiple paths that are hard to resolve, which leads to highly fragmented assemblies with a total size larger than expected. This, in turn, causes numerous problems in downstream analyses i.e. fragmented gene models, wrong gene copy number, broken synteny. To circumvent these caveats we have developed a pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative heterozygous contigs. We tested our pipeline on simulated and naturally-occurring heterozygous genomes and compared its accuracy to other existing tools. Our method was recently published [1] and it is freely available at https://github.com/lpryszcz/redundans. ∗ To whom the correspondence should be addressed: [email protected] [1] L. P. Pryszcz, T. Gabaldón, NAR, 2016. 45 Ranking of RNA models Using Sphere Consensus Tomasz Ratajczak1, ∗ and Piotr Lukasiak1 1 Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland During the process of modeling unknown RNA molecule, for which no reference structure is available, various methods and tools can be used. Most of those tools will generate set of different models, but rather rarely they provide score that can be used to select the best model. Even if the score is available, the selection of the highest quality structure is usually done through manual inspection. Here we present an automatic method to rank models of a single unknown reference structure based on consensus approach. In our methodology RNAssess method [1] is used to identify conservative regions of a molecule at various precision levels capturing both local and global motifs. For each nucleotide a sphere of specified radius is calculated and all atoms inside the sphere selected. Substructure consisting of those atoms is compared between all models. Candidate models containing more common motifs are ranked higher than models with less common structures. Using set of sphere radii, a local and global comparison is performed. Assuming that models are generated independently, common motifs can indicate properly predicted regions. The method provides a uniform score that can be used to rank RNA 3D models of the same RNA sequence without knowledge about its 3D structure. ∗ To whom the correspondence should be addressed: [email protected] [1] P. Lukasiak, M. Antczak, T. Ratajczak, M. Szachniuk, M. Popenda, R.W. Adamiak, J. Blazewicz, NAR, 2015, Volume 43, W502-W506 46 Tabu Search Algorithm for RNA Partial Degradation Problem Agnieszka Rybarczyk,1, 2, ∗ Alain Hertz,3 Marta Kasprzak,1, 2 and Jacek Blażewicz1, 2 1 Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland 2 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznań, Poland 3 Department of Mathematics and Industrial Engineering, Ecole Polytechnique and GERAD, Montreal, Canada In the last few years, there has been observed a great interest in the RNA research due to the discovery of the role that RNA molecules play in the biological systems. They do not only serve as a template in protein synthesis or as adaptors in translation process but also influence and are involved in the regulation of gene expression. It was demonstrated that most of them are produced from the larger molecules due to enzyme cleavage or spontaneous degradation. In this work, we would like to present our recent results concerning the RNA degradation process. In our studies we used artificial RNA molecules designed according to the rules of degradation developed by Kierzek and co-workers [1, 2]. On the basis of the results of their degradation, we have proposed the formulation of the RNA Partial Degradation Problem (RNA PDP) and we have shown that the problem is strongly NP-complete [2]. We would like to propose a new efficient heuristic approach, in which two tabu search algorithms cooperate. The algorithm can reconstruct a given RNA molecule, having as input the results of the biochemical analysis of its degradation, which possibly contain errors (false negatives or false positives). Results of the computational experiment, which prove the quality and usefulness of the proposed method, are presented. ∗ To whom the correspondence should be addressed: [email protected] [1] R. Kierzek, Methods Enzymol., 2001, 341, 657-75. [2] J. Blazewicz, M. Figlerowicz, M. Kasprzak, M. Nowacka, A. Rybarczyk, Journal of Computational Biology, 2011, 18, 821-834. 47 Conformational sampling of a biomolecular rugged energy landscape J. Rydzewski,1, ∗ R. Jakubowski,1 G. Nicosia,2 and W. Nowak1 1 Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland 2 Department of Mathematics and Computer Science, University of Catania, Viale A. Doria, 6-95125 Catania, Italy The protein structure refinement using conformational sampling is important in hitherto protein studies. In this paper we examined the protein structure refinement by means of potential energy minimization using immune computing as a method of sampling conformations. The method was tested on the x-ray structure and 30 decoys of the mutant of [Leu]Enkephalin, a paradigmatic example of the biomolecular multiple-minima problem. In order to score the refined conformations, we used a standard potential energy function with the OPLSAA force field. The effectiveness of the search was assessed using a variety of methods. The robustness of sampling was measured by the energy yield function which measures quantitatively the number of the peptide decoys residing in an energetic funnel. Furthermore, the potential energy-dependent Pareto fronts were calculated to elucidate dissimilarities between peptide conformations and the native state as observed by x-ray crystallography. The following conclusions can be drawn from the foregoing discussions: (i) our results suggest that the potential energy landscape has a very rugged nature and is perhaps selfsimilar, i.e., has similar character on a different metric scales [1]; (ii) the potential energy changes implicated by the small-scale movements are unphysically large. This fact is perhaps related to an analytical form of force fields whose multimodality causes the ruggedness of potential energy landscapes [2]. J. Rydzewski would like to acknowledge financial support from The National Science Centre, Poland (grant 2015/19/N/ST3/02171). ∗ To whom the correspondence should be addressed: [email protected] [1] R. Elber and M. Karplus. Science 235(4786), 318–321, 1987. [2] J. Higo, N. Ito, M. Kuroda, S. Ono, N. Nakajima and H. Nakamura. Prot. Sci. 10(6), 1160–1171, 2001. 48 Ligand diffusion pathways in cytochrome P450cam J. Rydzewski1, ∗ and W. Nowak1 1 Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland Computational simulations in molecular biophysics describe in atomic detail the structure, dynamics and many functions of biological macromolecules. The process of ligand diffusion inside proteins is an example of a complex dynamical event that can be modeled using molecular dynamics simulations. The study of biomolecular interactions between a ligand and its biological target is of paramount importance for the design of novel drugs. Because of that, identifying the ligand access/egress pathways and understanding how ligands migrate through labyrinthine tunnels in proteins is an area of a pivotal interest, which has spurred the development of several approaches in computational biophysics. Unfortunately, the process of ligand dissociation is challenging to study experimentally and in the absence of time-resolved crystallography experiments on ligand intermediates, the actual ligand expulsion pathways remain to a large extent undetermined. Moreover, the complex topology of channels in proteins leads often to difficulties in modeling of the ligand escape pathways by classical molecular dynamics simulations, thus rendering both experimental and computational techniques difficult to apply. We report a recently developed computational methodology involving reconstruction of reaction coordinates of the ligand diffusion by enhanced sampling during molecular dynamics simulations [1]. Moreover, we briefly describe machine-learning procedures that can be helpful during post-processing of the ligand diffusion paths. Namely, we report an application of a nonlinear dimensionality reduction method to represent the high-dimensional configuration space of the ligand-protein dissociation process in a way facilitating interpretation [2]. We illustrate the above methods on cytochrome P450cam. J. Rydzewski would like to acknowledge financial support from The National Science Centre, Poland (grant 2015/19/N/ST3/02171). ∗ To whom the correspondence should be addressed: [email protected] [1] J. Rydzewski and W. Nowak. J. Chem. Phys. 143(12), 124101, 2015. [2] J. Rydzewski and W. Nowak. J. Chem. Theory Comput. 12, 2110–2120, 2016. 49 Modeling of transcription factor binding sites–a machine learning approach Karolina Smolińska,1, 2, ∗ Marcin Pacholczyk,1 and Marek Kimmel1, 3 1 Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland 2 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland 3 Department of Statistics, Rice Univeristy, Houston, TX, USA Transcription factor binding sites are important for many intracellular processes. This is the reason, why scientists create new methods of modeling and detecting TFBSs structures in DNA. The TFBSs are traditionally modeled by Position Weight Matrices (PWMs) obtained either computationally or from experimental data. We propose a modification of Alamanova et al. [1] computational approach, implemented as 3DTF server by Gabdoulline et al. [2] The method requires crystal structures of TF-DNA complexes. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has significant impact on quality of resulting PWM matrix. Consequently, we created the method for PWM quality improvement based on receiver operator characteristics (ROC) curves and 10-fold cross-validation. Selection of the best performing PWM matrix was based on the area under the curve (AUC) parameter. We applied the presented method to data available for members of NF-kB family: p50p50, p50p65, p50RelB, p53 and other TF like HSF1 and Erα. We verified effectivity of detecting TFBSs by improved 3DTF matrices on experimental data from TRANSFAC database. To test presented technique we compared matrices constructed for unmodified Alamanova et al. approach, original PWMs downloaded from 3DTF server, matrices from 3DTF server improved by our method, and matrices from TRANSFAC database. The comparison shows significant similarity and comparable performance between matrices improved by our method and experimental matrices (TRANSFAC). The proposed approach can be a promising alternative to experimental techniques of detecting TFBSs. This work has been supported by Polish National Science Centre funds under grants SYMFONIA 3: UMO 2015/16/W/NZ2/00314 based at the Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, OPUS DEC-2012/05/B/NZ2/01618 and BKM based at the Institute of Automatic Control, Silesian University of Technology ∗ To whom the correspondence should be addressed: [email protected] [1] Alamanova D., Stegmaier P., Kel A. Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. BMC Bioinformatics, (2010) May 3;11:225 [2] Gabdoulline R., Eckweiler D., Kel A., and Stegmaier P.: 3DTF: a web server for predicting transcription factor PWMs using 3D structure-based energy calculations. Nucl. Acids Res. (2012) 40 (W1): W180W185 50 Pattern recognition approach to rheumatic diseases study Beata Sokolowska,1, ∗ Marta Hallay-Suszek,2 Leszek Czerwosz,1 Teresa Sadura-Sieklucka,3 Krystyna Ksieżopolska-Orlowska,3 and Bogdan Lesyng1, 4 2 1 Mossakowski Medical Research Center, Polish Academy of Sciences, Warsaw, Poland Interdisciplinary Center for Mathematics and Computational Modelling, University of Warsaw, Poland 3 Institute of Rheumatology, Warsaw, Poland 4 Faculty of Physics, University of Warsaw, Poland Rheumatic diseases (RDs) are the most common diseases of civilization and their incidence is close to 4-5% [1, 2]. Osteoarthritis (OA) and rheumatoid arthritis (RA) are two serious RDs and they account for a significant percentage of musculoskeletal disability. OA is a chronic disease associated with senility, and it is characterized by an irreversible damage of the joints structure. RA is a chronic auto-inflammatory disorder, that leads to cartilage destruction, bone erosion, and—subsequently—to joint deformities. The etiology of the RDs remains still unknown. The clinical data was gathered using a posturographic platform of Pro-Med Posturography Computer System [3]. In the proposed analytical approach, the algorithms of the pattern recognition were used for differentiation between healthy subjects and rheumatic patients. In the first step the k-NN classifier was constructed and then the misclassification rates (Er) using the leave-one-out method were computed. The set of posturographic parameters were treated as features: (i) the average radius of sways, (ii) the developed area and (iii) the total length of posturograms, as well as directional components of sways, such as (iv) the length of left-right motions and (v) the length of forward-backward motions. In addition, (vi) the biofeedback coordination estimated efficiency of the posture self-correction. Three standard posturographic tests were applied: with eyes open (EO) or closed (EC), and with the visual biofeedback control (BF). In summary: (i) the performed analysis of the posturografic parameters with the proposed pattern recognition approach allow identification of the rheumatic and healthy groups, (ii) BF test is more effective than others (Er values of the BF test were significant smaller than EO and EC values, both before and after feature selection). Advantages of the posturography studies due to their non-invasiveness, simplicity, together with the pattern recognition approach may be important, very helpful and reliable tool for clinicians and rehabilitants for diagnosing/monitoring of patients with balance disturbances [3, 4]. The authors would like to thank Adam Jóźwik for releasing his k-NN software. The study was supported by statutory budget of the Polish Academy of Sciences Mossakowski Medical Research Center, computations and analysis were carried out using the computational infrastructure of the Biocentrum-Ochota project. ∗ To whom the correspondence should be addressed: [email protected] [1] R. Wong, A.M. Davis, E. Badley et al., Prevalence of arthritis and rheumatic diseases around the world. A growing burden and implications for health care needs. 2010 Arthritis Community Research and Evaluation Unit. [2] B. Kwiatkowska, F. Raciborski, M. Maślińska et al., RAPORT: Wczesna diagnostyka chorób reumatycznych—ocena obecnej sytuacji i rekomendacje zmian. Wyd. Instytut Reumatologii im. prof. dr. hab. med. Eleonory Reicher, 2014, Warszawa. [3] B. Sokolowska, L. Czerwosz, M. Hallay-Suszek et al., Posturography in patients with rheumatoid arthritis and osteoarthritis. Adv Exp Med Biol, 2015, 2, 63–70. [4] L. Czerwosz, E. Szczepek, B. Sokolowska et al., Posturography in differential diagnosis of normal pressure hydrocephalus and brain atrophy. Adv Exp Med Biol, 2013, 755, 311–324. 51 Massively parallel sequencing in diagnostics of genodermatoses Justyna Sota,1, ∗ Katarzyna Wertheim-Tysarowska,1 Dominika Śniegórska,1 Alicja Grabarczyk,1 Tomasz Gambin,1 Anna Kutkowska-Kaźmierczak,1 Katarzyna Końska,2 Jolanta Wierzba,3 Katarzyna Woźniak,4 Cezary Kowalewski,4 and Jerzy Bal1 1 Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland 2 Genetic Counseling, University Children’s Hospital of Cracow, Poland 3 Genetic Counseling for Adults and Children, University Clinical Center, Gdansk, Poland 4 Department of Dermatology, Medical University of Warsaw, Poland Introduction: The genodermatoses are a large group of inherited skin disorders, often with additional multisystem symptoms. Genetic bases are very heterogeneous and include mutations in more than 100 genes. The next generation sequencing technologies enable massively parallel sequencing (MPS) of all genes linked with genodermatoses. Patients and Methods: Five patients with clinical symptoms of different types of genodermatoses were subjected to targeted MPS. Mapping and variant calling were performed with the use of BWA (hg19) and GATK algorithms, respectively. Variant calls were annotated using VariantStudio (Illumina) and Annovar software (http://annovar.openbioinformatics.org). Annotated data were filtered according to 1) appropriate gene panel, 2) parameters of sequencing quality, 3) frequency of identified variants in population and in-house databases, and 4) in silico prediction of the effect of mutation on protein function. Identified point mutations were confirmed by Sanger sequencing and, when possible, cosegregation analysis within the family was performed. Results: Clinical diagnosis was confirmed for all five patients using bioinformatic and cosegregation analysis. The first case was a 33-year-old female with palmoplantar keratoderma and generalized blistering of skin, resembling symptoms of epidermolysis bullosa simplex (EBS). Analysis revealed deleterious splice-site mutation in one allele of KRT1 gene (NM 006121.3: c.591+1G¿A), in which mutations were not correlated with EBS so far. The second case was a newborn deceased shortly after birth with clinical suspicion of epidermolysis bullosa hereditaria, in whom we identified a deleterious, de novo, point mutation in one allele of KRT5 gene (NM 000424.3: c.527A¿G (p.Asn176Ser)) linked with autosomal dominant EBS. The third case was a newborn with clinical symptoms of autosomal recessive harlequin ichthyosis. We detected previously unreported deletion of single nucleotide in one allele and deleterious and point mutation in second allele of ABCA12 gene (NM 173076.2: c.6194del and c.5848C¿T (p.Asn2065ThrfsTer3 and p.Arg1950Ter)). In the fourth case (1-year-old girl with clinical diagnosis of epidermolytic keratoderma) we found deleterious de novo point mutation in one allele of KRT10 gene (NM 000421.3: c.467G¿A (p.Arg156His)). Finally, in an 8-year-old girl with clinical symptoms of junctional epidermolysis bullosa (JEB), our analysis revealed two deleterious mutations in COL17A1 gene: paternal pathogenic mutation NM 000494.3: c.1826G¿A (p.Gly609Asp) and maternal novel, stop gain mutation NM 000494.3: c.1490 1491delinsT (p.Ala497ValfsTer23). Conclusion: MPS is an efficient and cost-effective method in molecular diagnostics of genodermatoses, which often manifest with high phenotypic and genetic heterogeneity. In patients with nonspecific symptoms of skin disorder or in cases where clinical diagnosis requires molecular analysis of a large range of genes, MPS should be considered as a method of a first choice. Supported by 2014/13/D/NZ5/03304. ∗ To whom the correspondence should be addressed: [email protected] 52 CNVs detection algorithm as a useful diagnostic tool in targeted NGS analysis Justyna Sota,1, ∗ Tomasz Gambin,1 Katarzyna Niepokój,1 Agnieszka Charzewska,1 Anna Kutkowska-Kaźmierczak,1 Anna Jakubiuk-Tomaszuk,2 Alicja Grabarczyk,1 Katarzyna Sobecka,1 Barbara Wiśniowiecka-Kowalnik,1 Marta Kedzior,1 and Monika Gos1 1 Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland 2 Department of Pediatric Neurology and Rehabilitation, Medical University of Bialystok, Bialystok, Poland Background: DNA Copy-Number Variations (CNVs), next to Single Nucleotide Variations (SNVs), are an important source of genetic variability responsible for both population diversity and rare genetic diseases. Recent studies have shown that CNVs might be found in approximately 15% of genes that point mutations are associated with specific monogenic diseases. Moreover, clinical symptoms of patients with recurrent and rare CNVs can overlap with phenotypes caused by SNVs. Therefore, it should be considered to analyse SNVs together with CNVs. The progress in the next generation sequencing (NGS) technologies and computational algorithms enables to simultaneously identify SNVs and CNVs in the human genome. Patients and methods: Seventy seven patients with various clinical phenotypes were subjected to targeted sequencing comprising ”clinome” - clinically relevant regions in the genome (4813 genes, TruSight One, Illumina). Mapping and variant calling were performed with the use of BWA (hg19) and GATK algorithms, respectively. Variant calls (SNVs) were annotated using Annovar software (http://annovar.openbioinformatics.org). Annotated data were analysed according to phenotype-corresponding gene panel. The CNV analysis was performed with the use of computational algorithm implemented as python programs named CoNIFER (copy number inference from exome reads, http://conifer.sourceforge.net/) against all 77 samples data. Positive results were confirmed using array-CGH or MLPA method. Results: Molecular confirmation of diagnosis was obtained in 39/77 (50,6%) cases. Using CoNIFER algorithm we identified likely pathogenic CNVs in 4/39 (10,3%) patients. First case is a 7-year-old boy with clinical symptoms of Diamond-Blackfan anemia (Asae-Smith Syndrome II, OMIM #105650). There were no pathogenic SNVs in selected gene panel (RPS7, RPS10, RPS17, RPS19, RPS24, RPS26, RPL5, RPL11, RPL15, RPL26, RPL35A). The CNV analysis revealed the presence of deletion within chr1:91861459-92841875 region including RPL5 gene. Second case is 4-year-old girl with clinical symptoms of Kleefstra syndrome (OMIM #610253). There were no pathogenic SNVs in selected gene panel (EHMT1, MBD5, KMT2C, SMARCB1, NR1I3). The CNV analysis has shown the deletion encompassing chr9:139874623–140728986 region in which EHMT1 gene is localized. Third case is 6-year-old girl with Noonan Syndrome phenotype with hypertrophic cardiomyopathy. No pathogenic SNV was identified within RASopathies genes and the CNV analysis revealed the presence of duplication in 22q11.21 region including LZTR1 gene. Fourth case is 19-year-old boy with a non-syndromic deafness. The SNV analysis involved ¿150 genes but no pathogenic variants were identified. The CNV analysis has shown the deletion of STRC gene and partially CATSPER2 gene. Conclusions: Likely pathogenic CNVs corresponding with phenotypes were identified in 4/39 (10,3%) cases with molecular confirmation of diagnosis. Therefore, CNV detection algorithm is a valuable step of NGS data analysis and should be performed together with SNV annotation. ∗ To whom the correspondence should be addressed: [email protected] 53 Distribution analysis of L - SAARs in signal peptides across multiple eukaryotes Michal Stolarczyk,1, ∗ Pawel Labaj,2 and Joanna Polańska1 1 2 Silesian University of Technology, Gliwice, Poland University of Natural Resources and Life Sciences, Vienna, Austria It has been shown that DNA sequence repetitions are responsible for diseases called trinucleotide repeat disorders. The most studied one is Huntington disease which is developed after the extension of polyglutamine tracts encoded by reiterations of CAG codon over a certain length. Other known disorders are attributable to repeats of glutamic acid (Friedrich’s ataxia) [1], arginine (Fragile X syndrome) [2] or leucine (Myotonic dystrophy) [3]. Taking this into consideration, trinucleotide repeats are unique structures of DNA and doubtlessly require thorough analysis. Translated into peptide sequences trinucleotide repeats are especially important as they directly influence processes taking place in living organisms’ cells. Such sequences can give rise to amino acid repeats (AARs). Single amino acid repeats (SAARs) are reiterations of single amino acids within peptides. They occur more frequently in Eukaryotes than in Prokaryotes which suggests that creation of those is a relatively recent evolutionary process [4]. According to the COPASAAR database for proteomic analysis and single amino acid repeats leucine is the most abundant amino acid creating SAARs both in Prokaryotes and Eukaryotes. Consequently, even the preliminary analysis of the proteomes insinuates the significance of those containing leucine. The vast majority of leucine reiterations (90% of all in human) is located at the amino-terminus of proteins. In proteins that undergo translocation, this is referred to as a signal peptide. Signal peptides have discernible three-domain structure: an elemental domain of diversified length, a 7 - 13 residue hydrophobic domain, a slightly polar domain. Consequently, hydrophobic single amino acid repeats are abundant in signal peptides. However, according to the literature, leucine repeats are overrepresented even more than others [5]. This presents a question of what the extra purpouse of the leucine repeats’ overrepresentation is in signal peptides besides their hydrophobicity. Here, we analyze the distribution of leucine repeats found in signal peptides of orthologous Eukaryotic proteins. Thorough analysis facilitates the detection of trends determining the direction of evolution and prospectively the determination of the origins of L - SAARs in signal peptides. The study was focused chiefly on mammals, the dataset consisted of the proteomes of six organisms: Human, Chimpanzee, House mouse, Cow, Red jungle fowl and Tropical clawed frog. For the sake of determination of the origins of L – SAARs lengths both of L – runs and signal peptides were analyzed. This approach aids rejection of the hypothesis that AARs had arisen only due to replication slippage mechanism as it is thought for tandem repeats in general [6]. The next part of the study was amino acids to leucine change analysis. This stage resulted in suggestion that L – SAARs could have originated from point muatations and that replication slippage is not the only genome creating phenomenon. However, to confirm it profoundly there is a need to investigate the variability and the composition of those on the evolutional path on the nucleotide level. ∗ [1] [2] [3] [4] [5] To whom the correspondence should be addressed: [email protected] V. Campuzano et al., Science, 1996, 271, 1423–1427. Peprah, E., Annals of Human Genetics, 2012, 76, 178–191. Mahadevan, M. et al., Science, 1992, 255, 1253–1255. Depledge, D. P. et al., BMC bioinformatics, 2005, 6, 196. Labaj, P. P. et al., FEBS Journal, 2010, 277, 3147–3157. 54 Conservation metrics overview - pros and cons Michal Stolarczyk,1, ∗ Alicja Pluciennik,1 Tomasz Magdziarz,1 and Artur Góra1 1 Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland Protein engineering can substantially benefit from computer aided rational design methods. Identification of amino acids essential for the intrinsic or engineered activity of the enzyme is one of many possible preliminary steps of the enzyme functionality improvement. There are number of approaches aiming at identification of amino acids that influence its activity, selectivity or stability. Following the hypothesis that functionally important amino acids are relatively highly conserved, conservation analysis of each multiple sequence alignment (MSA) position seems to be the best suited method for this purpose [1]. As a matter of fact, conservation as an attribute of MSA can be considered variously, depending on the research purpose. Therefore, there is an abundance of conservation metrics present in the literature [2]. Performance of ones differs relevantly hence may facilitate diversified analyses. Thereupon, we present a comprehensive comparison of assorted conservation metrics available in the literature and our concept of defining the conservation score. It outperforms metrics based on entropies and weighted according to substitution matrices in high conservation regions considering its resolution. Simultaneously, it stays correlated with those in the rest of conservation score scope. Such a characteristic is a consequence of employing the conservation duality and attaching the importance to the number of amino acids on the MSA position under consideration rather than to the fractions of ones at this position. Also, we show that coupled analysis can provide the researcher with more valuable information on the MSA position than applying only one conservation metric for that purpose. ∗ To whom the correspondence should be addressed: [email protected] [1] M. Kimura, Nature, 1968, 217, 624 – 626. [2] W.S.J. Valdar, Proteins, 2002, 48, 227–241. 55 Influence of the primary analysis on discovering differentially expressed genes based on RNA-Seq data Alicja Szabelska,1, ∗ Joanna Zyprych-Walczak,1 Idzi Siatkowski,1 and Michal Okoniewski2 1 Department of Mathematical and Statistical Methods Poznan university of Life Sciences 2 Scientific IT Services, ETH Zurich RNA-Seq uses the capabilities of next-generation sequencing (NGS) technologies to measure the presence of sequences transcribed from all the genes simultaneously. Those measurements can be used to estimate the differential expression between the groups of biological samples or to detect novel transcripts and isoforms of genes. There is a number of statistical and computational methods that can tackle the analysis and management of the massive and complex datasets produced by the sequencers. Still, those methods are often prone to be distorted by algorithmic and technological artifacts as well as noise added by the laboratory methods. The analysis of RNA-seq data starts with primary analysis, which is most often mapping (alignment) to the genome. Then there is a stage of genomic feature extraction and counting and their normalization, which produces input to the statistical tests. This study is focused on a comprehensive comparison of two different mappers [1, 2] and five normalization methods [3? ? ? ? ] and their impact on the results of gene expression analysis. We show that primary analysis has profound effect on the results of the analysis. In particular we show that for many important genes, their expression levels and differential expression can be calculated obtaining very diverse results, which is in line with the findings of recent study [4]. In conclusions we provide suggestions on possible good practices that can make the RNA-seq data analysis closer to the ”biological truth” that it attempts to find. ∗ To whom the correspondence should be addressed: [email protected] [1] Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 2013, 41(10):e108 [2] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 2013, 14:R36 [3] Leng N, Dawson J, Thomson J, et al. EBSeq: an empirical bayes hierarchical model for inference in RNA-seq experiments. University of Wisconsin: Tech. Rep. 2012, 226 [4] Robert C and Watson M, Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biology, 2015, 16:177 56 Towards a simple mathematical model of hampered diffusion in biological setting Piotr Weber,1, ∗ Wieslaw Nowak,1 and Piotr Peplowski1 1 Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Poland The cell membrane is a barrier that surrounds the cytoplasm of living cells and separates the intracellular components from the external environment. It also allows a selective transport of molecules and is able to regulate what enters and exits in the cell. The movement of molecules across the membrane is performed by using a number of transport mechanisms. Despite of rich knowledge about this mechanisms there are still cases that are mysterious for scientist community. Several works have shown that times of resting of small peptides in the cell membrane can be very long, much longer that for the simple diffusion process [1]. Also measurements of protein motion in cell membranes frequently are compatible with subdiffusive process [2–6]. ∂W (x, t) ∂W (x, t) σ 2 ∂ 2 W (x, t) α + aC + , 0 Dt W (x, t) = V ∂t ∂x 2 ∂x2 α where C 0 Dt is a Caputo fractional derivative. This asymptotic form is determined by parameters describing underlying stochastic motion. We also show density evolution according to fractional differential equation for asymptotic model and obtain a solution for various model parameters. ∗ [1] [2] [3] [4] [5] [6] To whom the correspondence should be addressed: [email protected] K. Kuczera, private communiction F. Höfling, T Franosch, Reports on Progress in Physics, 2013, Vol. 4, No.4, 046602 I. Goychuk, P. Hänggi, Physica A, 2003, Vol. 325, 9-18. I. Goychuk, P. Hänggi, Physical Review E, 2004, Vol. 70, 051915. T.F. Nonnenmacher, D.J.F. Nonnenmacher, Physics Letters A, 1989, Vol.140, 323-326 P. Weber, P. Peplowski, Acta Physica Polonica B, 2013, 44, 1173 - 1184. 57 StructAnalyzer - a tool for sequence vs. structure similarity analysis Jakub Wiedemann1, ∗ and Maciej Milostan2 1 Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology 2 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Z. Noskowskiego 12/14, 61 704 Poznan, Poland Comparative analysis of the structures and biological sequences may lead to determination of characteristic structural and functional elements of biological compounds. Thorough exploration of sequence-structure space allows to identify on one hand molecules (protein or RNA) with totally different sequences but similar structures and on the other hand molecules with similar sequences but different structures. The latter case is currently the most interesting from our perspective, because it allows to identify sequences prone to significant structural change due to small number of point mutations. Analysis of multiple sequences and structures are quite complex in terms of computational time and complexity. Therefore, the usage of parallel processing method is indicated in such cases and provides obtaining results in much less time. Hereby, we present StructAnalyzer, a tool for sequence versus structure similarity analysis. ∗ To whom the correspondence should be addressed: [email protected] 58 The impact of the crossover operator on the results of evolutionary-based algorithms in the problem of the genetic code optimization Pawel Blażej,1 Malgorzata Wnetrzak,1, ∗ and Pawel Mackiewicz1 1 Department of Genomics, Faculty of Biotechnology, University of Wroclaw, ul. Joliot-Curie 14a, Wroclaw, Poland One of the most popular theories concerning the origin and evolution of the standard genetic code is the adaptive hypothesis. It assumes that the genetic code was optimized to minimize harmful effects of mutations and translational errors leading to amino acid replacements in the coded proteins [1, 2]. The best way to assess the extent of this optimization is a comparison of the standard genetic code with optimized alternatives, which can be found in the space of possible genetic codes. However, the total number of these possible codes is extremely huge, i.e., greater than 1.51×1084 assuming 64 nucleotide triplets encoding 20 amino acids and three stop translation signals. Therefore, in our searches, we used Evolutionary Algorithms approach, as did the authors of the previous works related to this topic [3]. However, their simulations were based only on mutation operators. Since it is well known that EA are founded not only on mutation but also on crossover operators, we proposed crossover operators suitable for considered models of the genetic code and presented possible advantages of using them together with mutation to solve the problem of the genetic code optimization. We compared algorithms with various mutation and crossover probabilities under two different models of the genetic code. The results indicate that the usage of the crossover operator can significantly improve the quality of solutions received in the singleobjective optimization case. Our results demonstrate that the standard genetic code is only locally optimized because it is possible to find alternatives that are at least two times better optimized. ∗ To whom the correspondence should be addressed: [email protected] [1] S. J. Freeland, T. Wu, N. Keulmann, Orig Life Evol Biosph, 2003, 33(4-5): 457-477 [2] R.D. Knight, S.J. Freeland, L.F. Landweber. Trends Biochem. Sci., 1999, 24, 241-247 [3] J. Santos, A. Monteagudo, BMC Bioinformatics, 2011, 12, 1-8 59 Correlated mutations select misfolded from properly folded proteins Pawel P. Woźniak,1, ∗ Malgorzata Kotulska,1 and Gert Vriend2 1 Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wroclaw, Poland 2 Centre for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, the Netherlands Knowledge about the three dimensional structure of proteins is a prerequisite for studies on their behavior, stability, or their role as target in drug design. The traditional methods for structure determination are either experimental, such as X-ray crystallography and NMR, or in silico like homology modeling. Experimental methods are costly and time-consuming while homology modeling is dependent on the availability of a homologous protein structure. Therefore, computational methods that allow for protein structure reconstruction from sequence only have long been looked for. One of these is the recently developed direct coupling analysis (DCA) method [1, 2]. DCA method achieves the best results in residue-residue contact prediction from multiple sequence alignments only. Predicted contacts are used as restraints in the reconstruction of the three-dimensional structure of a protein. Unfortunately, the accuracy of present day DCA methods is on the order of 40% among the 100 strongest predicted contacts. This is insufficient for ab initio protein structure reconstruction. The results of DCA can, however, support protein structure reconstruction in several ways. We showed that DCA algorithm is able to indicate a better structure among properly folded and misfolded variants by the prediction of residue-residue contacts for these proteins. We counted the number of correctly predicted contacts among the strongest 100 predictions made by DCA for a set of obsolete PDB files and their successors and for 22 proteins for which the Decoys ’R’ Us database [3] provided properly folded and misfolded structures. These counts were related to structure similarity scores, such as RMSD or TM-score [4]. DCA predicts properly significantly more contacts for properly folded structures than for misfolded ones. Our method works much better for structures determined with X-ray crystallography than with the NMR spectroscopy. We discuss the most interesting cases. The method will not detect misfolded proteins per se, but when a protein structure experimentalist needs to choose between alternative folds for the same protein, DCA seems a useful aid. ∗ To whom the correspondence should be addressed: [email protected] [1] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D.S. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, Direct-coupling analysis of residue coevolution captures native contacts across many protein families, 2011, Proc Natl Acad Sci U S A 108(49):E1293-301. [2] C. Feinauer, M.J. Skwark, A. Pagnani, E. Aurell, Improving contact prediction along three dimensions, 2014, PLoS Comput Biol., 10(10):e1003847. [3] R. Samudrala, M. Levitt, Decoys ’R’ Us: A database of incorrect protein conformations to improve protein structure prediction, 2000, Protein Science 9: 1399-1401. [4] Y. Zhang, J. Skolnick, TM-align: A protein structure alignment algorithm based on TM-score, 2005, Nucleic Acids Research, 33: 2302-2309. 60 Mathematical modelling of immune response Joanna Ziobro,1, ∗ Pawel Blażej,1 and Pawel Mackiewicz1 1 Department of Genomics, Faculty of Biotechnology, University of Wroclaw, ul. Joliot-Curie 14a, Wroclaw, Poland The dynamic development of medicine and inventing new treatments requires modifying the current and creating new models describing immunological reactions of human organism. The immune response is a complex set of defensive reactions which includes the antigen recognition, its neutralization and elimination. There are two mechanisms of immune response which are interdependent: cellular and humoral response. We analyzed two models which describe the humoral type of human immune response. The first model describes the reaction between antibody and antigen. The second Marchuk’s model is more complicated because it takes into account the delay of proliferation of lymphocytes in respect of the antigen presentation [1]. There are two stationary states in each model. One describes the healthy state of an organism and the second characterizes a chronic disease. We analyzed the stability of the stationary states of these models. Our studies show that the first model does not describe all main processes during the immune response and even a very strong immune system is not always able to deal with the large dose of antigen. The stability of the first state in Marchuk’s model does not depend on the delay. However, the delay can be important in the case of chronic state [2]. We presented a method analyzing the system of delay differential equations [3]. This type of differential equations describes well the complex biological processes such as responses to infection that often occurs with some delay. The system of delay differential equations has several features complicating the analysis more than in case of the systems of ordinary differential equations. We studied the stability of the chronic state depending on the time of delay. ∗ To whom the correspondence should be addressed: [email protected] [1] G.I. Marchuk, R.V. Petrov, A.A. Romanyuakha, G.A. Bocharov, J. theor. Biol., 1991, 151, 1-69 [2] M. Bodnar, U. Foryś, Internat. J. Appl. Math. Comput. Sci., 2000, 10 (I), 101-116 [3] F.M. Asl, A.G. Ulsoy, J. Dyn. Syst. Meas. Cont., 2003, 125 (2), 215-223 61 A new method to evaluate quality of RNA 3D models Tomasz Zok,1, ∗ Maciej Antczak,1 Piotr Lukasiak,1, 2 and Marta Szachniuk1, 2 1 Institute of Computing Science, Poznan University of Technology 2 Institute of Bioorganic Chemistry, Polish Academy of Sciences The area of computational modelling of RNA 3D structures is constantly gaining more attention. With the advent of new approaches and constant refinement of existing ones the difficulty that can be tackled by computational means is rising. At the same time, the difficulty of another challenge is also on the rise. It concerns the problem of quality evaluation, especially when one lacks access to homologue structures. We propose a new method which addresses the problem of quality evaluation in two ways. Both of them rely on RNApdbee [1] which is required to unify and aggregate information about RNA 3D structure from various external tools. In the first mode, the method is able to score each of many models on a scale from 0 to 1 and effectively construct a ranking. This is obtained thanks to a voting mechanism in which every model provides its input on the correctness or wrongness of base-base interactions. In total, a consensus from all inputs is constructed and treated as a virtual target structure to which all models are ranked by means of Interaction Network Fidelity (INF). The same virtual structure is used in the second variant of the method. Here however, each base-base interaction on its own is under scrutiny of being right or wrong. As both variants rely on the constructed virtual target structure, it is of great importance to define precisely how the consensus is obtained. We conducted a series of computational experiments to refine the consensus threshold. Next, the obtained rankings and base-base interaction scores were compared with our benchmark made out of a few past challenges of RNA-Puzzles. We found out that recreation of RNA-Puzzles ranking and prediction of INF score are indeed two different aims and they both require different consensus threshold value. With these known, we participated in the newly created Quality Prediction category of RNA-Puzzles contest. ∗ To whom the correspondence should be addressed: [email protected] [1] M. Antczak, T. Zok, M. Popenda, P. Lukasiak, R.W. Adamiak, J. Blazewicz & M. Szachniuk. RNApdbee – a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. NAR, 2014, 42(W1), W368-W372 62 Study on correlated mutations in plant proteinase inhibitors - Bowman-Birk and Kunitz family Agata Żyźniewska1, ∗ and Jacek Leluk1 1 University of Zielona Góra, Faculty of Biological Sciences, Department of Biological Sciences, Zielona Góra, Poland Plant proteinase inhibitors (PIs) are small proteins, generally present at high concentration in storage tissues, but also detectable in leaves in response to the attack of insects and pathogenic microorganisms. PIs’ contribution to plant defense mechanisms relies on inhibition of proteinases present in insects’ guts or produced by microorganisms, causing a reduction in the availability of amino acids necessary for their growth and development [1]. In plants, there are many different types of the proteinase inhibitors. They belong to a group of anti-nutritional factors. The organisms exposed to their action may act negatively (eg. trypsin chelation) or positively (eg. antioxidant properties). Bowman-Birk inhibitors (BBI) are small serine proteinase inhibitors found in the leguminous and gramineous plants. Characteristically, their molecular masses are about of 7 kDa (about 70 amino acid residues) and they are rich in disulfide bonds. The Bowman-Birk inhibitors are also recognized as potential cancer chemopreventive agents [2, 3]. Human organisms consuming large amounts of BBI in their diet have been demonstrated to exhibit lower rates of colon, breast, prostate and skin cancers. In our study of this group of inhibitors there were identified and described 6 clusters of correlated mutations in functionally significant regions. Plant Kunitz-type inhibitors are present in leguminous seeds. The first discovered inhibitor from this family (SBTI) was obtained from Glycine max seeds and, over the past three decades, a large number of other inhibitors have been found. The data concerning the primary and tertiary structure of plant Kunitz-type inhibitors are helpful to understand their mechanisms of action as the coagulation factors, inflammation and tumors, and to allow to investigate which region of the protein is responsible for its biological activity [4, 5]. In our study of this protein group there were described 11 clusters of correlated mutations. The results of our study were obtained with the aid of new original software designed for phylogenetic studies, mutational variability within homologous proteins, and identification of correlated mutations. ∗ To whom the correspondence should be addressed: [email protected] [1] Ryan C.A. (1990), Protease inhibitors in plants: genes for improving defenses against insects and pathogens, Annu. Rev. of Phytopath., 28, 425-449 [2] Lippman S.M., Matrisian L.M. (2000), Protease Inhibitors in Oral Carcinogenesis and Chemoprevention, Clinical Cancer Research, Vol. 6, 4599-4603 [3] Jaulent A.M., Leatherbarrow R.J. (2004) Design, synthesis and analysis of novel bicyclic and bifunctional protease inhibitors, Protein Engineering, Design and Selection, Vol. 17 no. 9, pp. 681-687 [4] Oliva M. L. V., Sampaio M. U. (2009) Action of plant proteinase inhibitors an enzymes of physiopathological importance, Annals of The Brazilian Academy of Science, 81(3), 615-621 [5] Major I. T., Constabel C. P. (2008) Functional Analysis of the Kunitz Trypsin Inhibitor Family in Poplar Reveals Biochemical Diversity and Multiplicity in Defense against Herbivores, Plant Physiology, Vol. 146, pp. 888-903 63
© Copyright 2026 Paperzz