BIT16 Book of Abstracts

BIT16
Book of Abstracts
16-18 June 2016, Toruń, Poland
PROGRAM COMMITEE:
• Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland )
• Prof. Janusz Bujnicki (International Institute of Molecular and Cell Biology in Warsaw and
Adam Mickiewicz University in Poznan, Poland )
• Prof. Jarek Meller (University of Cincinnati, USA)
• Prof. Jerzy Tiuryn (University of Warsaw, Poland )
• Dr. hab. Witold Rudnicki (University of Warsaw, Poland )
LOCAL ORGANIZING COMMITTEE:
• Prof. Wieslaw Nowak (Nicolaus Copernicus University, Torun, Poland )
• Dr. Aleksandra Gruca (Silesian University of Technology, Gliwice, Poland )
• Dr. Lukasz Peplowski (Nicolaus Copernicus University, Torun, Poland )
• Dr. Anna Gogolińska (Nicolaus Copernicus University, Torun, Poland )
• M. Eng. Rafal Jakubowski (Nicolaus Copernicus University, Torun, Poland )
• M. Eng. Jakub Rydzewski (Nicolaus Copernicus University, Torun, Poland )
Contents
Lectures
Eran Elhaik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
From DNA to Home Village in 3 seconds: using the Geographic Population Structure (GPS) to empower personalized
medicine
Rafal Ploski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Whole exome sequencing in medical genetics
Andrzej Kloczkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Integration of structural and dynamics data for proteomics
Andrzej Koliński . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiscale modeling of large protein systems
Pawel P. Labaj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
City microbiome as a missing element of exposome for Personalized Wellness and Medicine
Lucjan Wyrwicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prediction of non-interacting protein pairs for ’omics’ studies in translational medicine
Malgorzata Kotulska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
On healthy and unhealthy contacts. Bioinformatics point of view
Alexander Wlodawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structural studies of medically-interesting protease inhibitors and lectins
Michal Laźniewski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application of molecular docking for predicting protein substrate specificity
Mai Suan Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A novel method for navigating optimal path for ligand escape from binding site and application to drug design
Sebastian Kmiecik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Peptide drug design using CABS-dock web server for protein-peptide docking
Karina Kubiak-Ossowska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Point-of-Care (PoC) testing kits. Focus on antibodies
Dimitar I. Vassilev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Models for error and variants detection in de novo sequenced samples and communities
Katarzyna Werheim-Tysarowska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bioinformatics in genetics of rare disorders from molecular diagnostician point of view
Jacek Leluk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparative studies on proteins. The effect of the wrong assumptions on the result accuracy and reliability
Jacek Śmietański . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Machine Learning Trends in RNA Bioinformatics
Maciej Sykulski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Mixture Gaussian Bayesian Graphical Model in application to DNA microarray segmentation robust to spatial
noise
Lukasz Peplowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application of the Molecular Dynamics Simulations in Health Related Issues
Jarek Meller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LINCS Molecular Signatures as a Resource for Personalized Precision Medicine
Posters
M. Antczak • 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application of new functionalities of RNAComposer in order to improve prediction accuracy
P. Boguslawska • 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modeling inhibitory mechanisms in biological systems using Petri nets
M. Burdukiewicz • 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AmyloGram: n-gram analysis and prediction of amyloids
K. Chmielewska • 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selected aspects of the participation of tobacco smoke in the development of atherosclerosis
using Petri nets
M. P. Ciemny • 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CABS-dock web server for flexible docking of peptides to proteins
P. Daniluk • 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computing maximal cliques on GPU – application to structural alignments
I. Deb • 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Epigenetic modifications in RNA: Molecular dynamics studies
R. Filip • 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computational study on hemagglutinin antigenic sites of influenza virus type A
W. Frohmberg • 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
. . . . . . . . . . . . .
21
22
. . . . . . . . . . . . .
23
. . . . . . . . . . . . .
24
. . . . . . . . . . . . .
modeled and analyzed
25
. . . . . . . . . . . . .
26
. . . . . . . . . . . . .
27
. . . . . . . . . . . . .
28
. . . . . . . . . . . . .
29
. . . . . . . . . . . . .
30
New approach to de-novo genome assembly using string graph fork detection technique and longest contig path
scaffolding method
M. Garbulowski • 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Predicting the age status of somatic mutations by Gaussian mixture decomposition
A. Gogolińska • 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Complete petri net study of medically relevant interactions in the immune system model
S. Goldowska • 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modification of the accessibility of the human soluble epoxide hydrolase active site, in silico study
J. Jablońska • 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
New method of Sholl analysis
R. Jakubowski • 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overminimization in molecular dynamics simulations
P. Kosiorek • 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Influence of the artificial protein nanotube on a cell membrane
J. A. Kowalska • 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Searching for structural patterns in the vicinity of microRNA in plants
M. Kurczyńska • 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PyRosetta energy terms as indicators for protein mirror models
M. Laźniewski • 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application of molecular docking for predicting protein substrate specificity
T. Magdziarz • 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PANTA RHEI: analysis of solvent flow in MD simulations
M. Mielczarek • 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NGS-based analysis of copy number variations in various cattle breeds
P. Miszta • 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theoretical study of influence of mutations on superoxide dismutase SOD1 dimer by molecular dynamics
C. Pareek • 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Transcriptomic analysis of gene expression data from Bos taurus liver using RNA-Seq
A. Pluciennik • 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The evolutionary glance at tunnels in proteins
L. P. Pryszcz • 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Redundans: an assembly pipeline for highly heterozygous genomes
T. Ratajczak • 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ranking of RNA models Using Sphere Consensus
A. Rybarczyk • 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tabu Search Algorithm for RNA Partial Degradation Problem
J. Rydzewski • 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conformational sampling of a biomolecular rugged energy landscape
J. Rydzewski • 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ligand diffusion pathways in cytochrome P450cam
K. Smolińska • 29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modeling of transcription factor binding sites–a machine learning approach
B. Sokolowska • 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pattern recognition approach to rheumatic diseases study
J. Sota • 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Massively parallel sequencing in diagnostics of genodermatoses
J. Sota • 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CNVs detection algorithm as a useful diagnostic tool in targeted NGS analysis
M. Stolarczyk • 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disitribution analysis of L - SAARs in signal peptides across multiple eukaryotes
M. Stolarczyk • 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conservation metrics overview - pros and cons
A. Szabelska • 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Influence of the primary analysis on discovering differentially expressed genes based on RNA-Seq data
P. Weber • 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Towards a simple mathematical model of hampered diffusion in biological setting
J. Wiedermann • 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
StructAnalyzer - a tool for sequence vs. structure similarity analysis
M. Wnetrzak • 38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The impact of the crossover operator on the results of evolutionary-based algorithms in the problem of the genetic
code optimization
P. Woźniak • 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correlated mutations select misfolded from properly folded proteins
4
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
J. Ziobro • 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mathematical modelling of immune response
T. Zok • 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A new method to evaluate quality of RNA 3D models
A. Żyźniewska • 42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Study on correlated mutations in plant proteinase inhibitors - Bowman-Birk
. . . . . . . . . . . . . . . . . . . . . .
61
. . . . . . . . . . . . . . . . . . . . . .
62
. . . . . . . . . . . . . . . . . . . . . .
and Kunitz family
63
Lectures
From DNA to Home Village in 3 seconds: using the Geographic Population
Structure (GPS) to empower personalized medicine
Eran Elhaik1
1
University of Sheffield, Department of Animal and Plant Sciences, Sheffield, UK
Humans’ place of origin is known to be valuable for studying history, anthropology, genetics, epidemiology, and has critical importance in the field of pharmacokinetics where a growing
number of treatments differ in their efficacy when applied to different populations. It is thereby
not surprising that the search for a method that utilizes biological information to predict humans’
place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data to address this question with limited success. Biogeographical algorithms using
next-generation sequencing data achieved an accuracy of 700 km in Europe but were inaccurate
elsewhere. The Geographic Population Structure (GPS) [1] utilizes a meager number of SNPs and
can place 83% of worldwide-individuals in their country of origin. Applied to over 200 Sardinian
villagers, GPS placed a quarter of them in their villages and most of the remaining within 50km of
their villages Recently, GPS localized Ashkenazic Jews to 1500 years old villages in northeastern
Turkey whose names likely derived from the word ”Ashkenaz” [2]. The accuracy and power of
GPS to infer the geographical origin of worldwide-individuals down to their country or, in some
cases, village of origin, underscore the promise of admixture-based methods for biogeography and
has broad ramifications for genetic ancestry testing, disease studies, and personalized medicine.
[1] Elhaik, E. et al. Geographic population structure analysis of worldwide human populations infers their
biogeographical origins. Nat. Commun. 5, doi:10.1038/ncomms4513 (2014).
[2] Das, R., Wexler, P., Pirooznia, M. & Elhaik, E. Localizing Ashkenazic Jews to primeval villages in the
ancient Iranian lands of Ashkenaz. Genome Biol. Evol., doi:10.1093/gbe/evw046 (2016).
2
Whole exome sequencing in medical genetics
Rafal Ploski1, ∗
1
Department of Medical Genetics, Warsaw Medical University
In 2012 Department of Medical Genetics (Warsaw Medical University) has acquired Illumina
HiSeq 1500 which allowed to establish whole exome sequencing (WES) as method for both research
and diagnostic purposes. Since then we have performed > 1000 WES analyses, most of which
aimed at finding diagnosis in patients suspected to suffer from rare neurological disorders with a
genetic basis. We also established bioinformatics infrastructure and a pipeline which allows efficient
analysis of the WES data. During the lecture selected findings will be presented illustrating how
WES enables discovery of new mutations in known disease associated genes (including mutations
associated with novel phenotypes) as well as discovery of novel diseases (i.e. those caused by
mutations in genes not yet associated with known human diseases).
∗
To whom the correspondence should be addressed: [email protected]
3
Integration of structural and dynamics data for proteomics
Andrzej Kloczkowski1
1
The Ohio State University College of Medicine, Columbus, USA
4
Multiscale modeling of large protein systems
Andrzej Koliński1
1
Faculty of Chemistry, University of Warsaw, Poland
The traditional computational modeling of protein structure, dynamics and interactions remains
difficult for many protein systems. It is mostly due to the size of protein conformational spaces
and required simulation timescales that are still too large to be studied in atomistic detail. Lowering the level of protein representation from all-atom to coarse-grained opens up new possibilities
for studying protein systems [1]. Possible multiscale strategies for efficient coarse-grained modeling and recent applications of CABS modeling tools are briefly discuses. CABS (C-Alpha, Beta
and Side-chain) is a medium resolution model. In comparison with other realistic coarse-grained
models, CABS provides similar resolution but it is based on qualitatively different interaction and
sampling concepts. The choice of united atoms for modeling main chains assumes two pseudoatoms per residue. Side chain are represented by two spherical pseudo-atoms, one centered on
Cβ and the other placed in the center of mass of the remaining portion of the side chain, where
applicable. The main chain Cα positions are restricted to knots of a cubic lattice of small spacing, equal to 0.61 Å. This lattice Cα trace is used as the only independent variable that defines
positions of other united atoms. Recently, we provided several easy to use web servers based on
the CABS based modeling techniques (available at: http://biocomp.chem.uw.edu.pl/tools). The
servers are dedicated to de novo and comparative modeling of structure prediction [2], studies of
protein dynamics [3], and unrestrained, fully flexible, docking of peptides to protein receptors [4, 5].
Multiscale modeling strategies for studies of large protein complexes, combining CABS simulations
with all-atom Molecular Dynamics, are briefly discussed.
Support from the National Science Center (Poland) grant MAESTRO 2014/14/A/ST6/00088
is kindly acknowledged.
[1] Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid , Kolinski A (2016) Coarse-grained protein models
and their applications. Chemical Reviews, in press
[2] Blaszczyk M, Jamroz M, Kmiecik S, Kolinski A (2013) CABS-fold: server for the novo and consensusbased prediction of protein structure. NAR 41(W1):W406-W411
[3] Jamroz M, Kolinski A, Kmiecik S (2013) CABS-flex: server for fast simulation of protein structure
fluctuations. NAR 41(W1):W427-W431
[4] Kurcinski M, Jamroz M, Blaszczyk M, Kolinski A, Kmiecik S(2015) CABS-dock: web server for flexible
docking of peptides to proteins without prior knowledge of the binding site. NAR 43(W1):W419-W424
[5] Blaszczyk M, Kurcinski M, Kouza M, Wieteska L, Debinski A, Kolinski A, Kmiecik S (2016) Modeling of
protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking.
Methods 93:72-83
5
City microbiome as a missing element of exposome for Personalized Wellness
and Medicine
Pawel P. Labaj1, ∗
1
Chair of Bioinformatics RG, Boku University Vienna,
Austria APART Fellow of Austrian Academy of Science, MetaSUB International Consortium
In the era of fast-paced development of technology and services, there are limitless opportunities
for customization to meet specific user needs. This is understandable since non-specific interventions for non-targeted populations often fall short of desired performance expectations in health
outcomes. Over the next decade, as much as half of the proportion of health care will shift from
the hospital and clinic to the home and community [1]. With Personalized Medicine understood
as prevention and treatment strategies that take individual variability into account we need to
identify this individual variability via characterizing each person’s individual baseline health state
instead of resorting to population-based variable distributions.
This health state baseline cannot be, however, determined with use of just the classical medical
records. Recent technological advances have created opportunities to harness additional sources of
biomedical data on a real time basis, for instance through the use of (1) mobile medical devices
for monitoring dedicated health parameters (insulin, heart rate, etc), and (2) wearables [2, 3].
Initially starting out as simple devices to monitor basic wellness parameters, these devices have in
recent years attracted a lot of interest and efforts from companies (e.g. Apple and Google) who
are keen on developing innovations that border on wellness and healthcare. The synergy of these
two streams should provide a good estimate of the health state baseline.
In order to model estimated data of health state baseline and future scenarios, it is imperative
to include an important, yet largely missing third component - the exposome. This term cover all
the exposures of an individual in a lifetime. So far it was mostly connected with air quality, light,
climatic variations, ozone and volatile organic compounds. But we cannot forget about the ’living’
component of exposome. As dense human environments such as cities account for over a half of
the world population [4] (in EU 80%) there is a need to build a molecular portrait of cities in order
to study what lives around us and how it affects our health and wellbeing [5].
∗
To whom the correspondence should be addressed: [email protected]
[1] Dishman, E. 2012 [www.ey.com/GL/en/Industries/Life-Sciences/The-personal-health-technologyrevolution]
[2] Milenković, A., Otto, C., & Jovanov, E. Computer communications 2012. 29, 2521–2533.
[3] Bonaccorsi, M., Fiorini, L., Cavallo, F., Esposito, R., & Dario, P. Ambient Assisted Living 2015 465–475.
[4] Afshinnekoo, E., et al. Cell Systems 2015 1 72–87.
[5] The MetaSUB International Consortium Microbiome 2016 4 24
6
Prediction of non-interacting protein pairs for ’omics’ studies in translational
medicine
Lucjan Wyrwicz1
1
Center of Oncology, Warsaw, Poland
Protein–protein interactions (PPIs) play a vital role in most biological processes. Hence their
comprehension can promote a better understanding of the mechanisms underlying living systems.
However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several
in silico PPI prediction methods using both structural and genomic information were developed
for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were
selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed
specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the
PPI/NIP predictive performances can be considerably improved by focusing on data preparation.
7
Healthy and unhealthy contact sites
Malgorzata Kotulska,1, ∗ Witold Dyrka,1 Bogumil
Konopka,1 Monika Kurczyńska,1 and Pawel Woźniak1
1
Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology,
Wroclaw University of Science and Technology
Contact maps of proteins specify these amino acids which are located in 3-dimensional structures
of proteins within a specified distance limit. The standard thresholds are 8 Å for calculating
distances between C or Cβ backbone atoms or 6 Å if whichever atoms are considered. Contact
sites may provide information of intramolecular distances, useful in protein structure reconstruction
or evaluation of its correctness – obtained either from experimental or modeling studies. Predicting
a contact map can also establish the first stage in modeling unknown protein structure. Moreover,
information of intermolecular contact sites is essential in molecular docking used for predicting and
engineering receptor binding, applied in a drug design. Research on contact sites encounters a few
issues which will be discussed in this talk.
Firstly, bioinformatical prediction of a protein contact sites from its amino acid sequence can
be an essential step in modeling protein structures. The best tools however still do not provide
sufficient accuracy. We will discuss different approaches and the problems that appear [1, 2].
Secondly, molecules of a natural and mirror orientations share the same contact maps, which
poses a problem with their use. Proteins are 3-dimensional objects which can assume different
orientations – for each protein several mirror structures could be constructed. The looking-glass
orientation may occur at the superficial level of a molecule general symmetry or its secondary
structure orientations. Deeper levels of inversions may concern different chirality of amino acids,
which may accompany mirror orientations at the higher levels. All these structures can also assume
similarly stable energy levels, hence all can exist in the real world. However, nature is not symmetrical and prefers only one type of an orientation: natural aminoacids are left handed, secondary
structures have a specific preference, e.g. helices are typically right-handed, and at the level of general symmetry only one structure has evolved. Mirror molecules are of great interest to researchers
pursuing a project of ”mirror life”, or in a quest for non-degradable aptamers. But in bioinformatics mirror structures could be troublesome. In the process of protein structure reconstruction,
based on a contact map, two sets of models are generated, which may be energetically equivalent.
Selecting the same structure as the nature could have chosen is not always straightforward [3].
Finally, an uncontrolled change in the pattern of a peptide contact sites is an issue appearing
in amyloid proteins. An irregular physiological pattern of contact sites may switch into very dense
contact sites which form a zipper-like beta structures. There are a few examples when it leads to
a desirable structure – very durable in terms of its mechanical properties and not susceptible to
proteolytic enzymes. However, in most cases of a biological tissue these dense aggregates trigger a
chain of events leading to the cell death, e.g. in Alzheimer’s and other neurodegenerative diseases.
The question we address is whether the onset of such unhealthy contacts is predictable and how
we can control the process [4–6].
∗
[1]
[2]
[3]
[4]
[5]
To whom the correspondence should be addressed: [email protected]
Konopka BM, Ciombor M, Kurczynska M, Kotulska M. J Membr Biol. 2014;247(5):409-20.
Wozniak PP, Kotulska M. J Mol Model. 2014;20(11):2497
Kurczynska M, Kania E, Konopka BM, Kotulska M. J Mol Model. 2016;22(5):111.
Gasior P, Kotulska M. BMC Bioinformatics. 2014;15:54.
Wozniak PP, Kotulska M. Bioinformatics. 2015;31(20):3395-7.
8
Structural studies of medically-interesting protease inhibitors and lectins
Alexander Wlodawer,1 Jacek Lubkowski,1 Alla Gustchina,1 Dongwen
Zhou,1 Michal Jakob,1, 2 Barry R. O’Keefe,2 Rodrigo da Silva
Ferreira,3 Yara A. Lobo,3 Daiane Hansen,3 and Maria L. V. Oliva3
1
Macromolecular Crystallography Laboratory, Center for Cancer Research,
National Cancer Institute, Frederick, MD 21702, USA
2
Molecular Targets Laboratory, Center for Cancer Research,
National Cancer Institute, Frederick, MD 21702, USA
3
Departamento de Bioquı́mica, Universidade Federal de São Paulo,
04044-020 São Paulo, SP, Brazil; e-mail: [email protected]
Several protease inhibitors and lectins with anti-cancer properties have been investigated by
X-ray crystallography as well as by biochemical and biophysical techniques. Two of them are
potent inhibitors of trypsin-related enzymes. EcTI, isolated from the seeds of Enterolobium contortisiliquum, inhibits the invasion of gastric cancer cells through alterations in integrin-dependent
cell-signaling pathway. BbKI, found in Bauhinia bauhinioides seeds, is a kallikrein inhibitor with
a reactive site sequence similar to that of kinins, the vasoactive peptides inserted in kininogen
moieties. A much weaker protease inhibitor isolated from the bark of Crataeva tapia tree (CrataBL) also functions as a lectin. BfL, a GalNAc-specific lectin from Brazilian orchid tree Bauhinia
forficata was shown to inhibit growth of several cancer lines. CGL, a lectin isolated from the sea
mussel Crenomytilus grayanus, was investigated based mainly on the similarity of its sequence to
another lectin, MytiLec, which was resistant to crystallization.
We determined high-resolution crystal structures of free EcTI and in complex with bovine
trypsin, in the process re-determining the amino acid sequence. Modeling of the putative complexes
of EcTI with several serine proteases and a comparison with equivalent models for other Kunitz
inhibitors elucidated the structural basis for the fine differences in their specificity. The structure
of free BbKI indicated that the presence of disulfide bonds is not necessary for stabilization of the
fold of the members of this family. A model of a complex of BbKI with plasma kallikrein indicates
the need for mutual rearrangement of the interacting molecules.
We have also determined the high-resolution crystal structure of glycosylated CrataBL. We have
shown that, as a lectin, CrataBL binds only sulfated oligosaccharides, most likely heparin and its
derivatives.
CGL displays antibacterial, antifungal, and antiviral activities, and displays high affinity for
mucin-type receptors, abundant on some cancer cells. We determined its crystal structure and
modeled the glycan-binding pockets, based on the location of the glycerol molecules bound in the
three sites exhibiting quasi-threefold symmetry.
A number of structures of BfL elucidated the mode of binding of its primary ligand GalNAc,
as well of a number of cancer-related Tn-antigens and blood group antigens, explaining the basis
of its very strict specificity, similar to the specificity of CGL despite a completely different threedimensional structure.
9
Application of molecular docking for predicting protein substrate specificity
Michal Laźniewski,1, 2, ∗ Krzysztof Kuchta,1 Dariusz Plewczyński,3 and Krzysztof Ginalski1
1
Laboratory of Bioinformatics and Systems Biology,
Centre of New Technologies, University of Warsaw, Poland
2
Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, Poland
3
Laboratory of Functional and Structural Genomics,
Centre of New Technologies, University of Warsaw, Poland
In recent years, the application of next generation sequencing brought a substantial increase in
the number of known protein sequences. The Refseq database alone contains now more than 60
million unique sequences, 50 times more than only ten years ago. For such an overwhelming amount
of data, functional assignment using only experimental techniques is a challenging task. With
structural genomics providing thousands of new protein structures [1], the additional information
might be utilized by structure-based methods to improve the quality of bioinformatics predictions.
Thus molecular docking might prove invaluable where other theoretical techniques fail to predict
the function of an analyzed protein.
It has been already shown that molecular docking can frequently predict the correct conformation of a small compound in a protein-ligand complex; however, calculating the binding energy
remains a challenge [2]. On the other hand, with most of the efforts focused on studying the
interactions between proteins and drug-like inhibitors, there only has been rather limited emphasis put on analyzing the proteins and their in vivo small molecule partners [3]. The goal of our
work was a comprehensive analysis of the performance of various molecular docking algorithms
in wide-scale predictions of substrate specificity of proteins from a single organism – Escherichia
coli. We analyzed all E. coli enzymes for which the crystal structure was solved together with
their cognate partner (the substrate or product of the reaction). Specifically, two sets of molecules
where docked to each enzymes: (a) compounds present in the active sites of all selected proteins
and (b) the entire metabolome of E. coli. The performance of four different programs (GOLD,
eHiTs, Surflex and Glide) were tested, including both the ability to identify the enzyme’s cognate
ligand and to correctly predict the ligand conformation. Moreover, we discuss if applying machine
learning methods could further increase the quality of such structure-based functional predictions.
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Weigelt, Exp cell res, 2010, 316(8), 1332-1338.
[2] D. Plewczynski, M. Lazniewski, et al., J Comput Chem, 2011, 32(4), 742-755.
[3] A. Macchiarulo, I. Nobeli, et al., Nat Biotechnol, 2005, 22(8), 1039-1045.
10
A novel method for navigating optimal path for ligand escape from binding
site and application to drug design
Mai Suan Li,1, ∗ Quan Van Vuong,2 and Tin Trung Nguyen2
1
2
Polish Academy of Sciences, Warsaw, Poland
Institute for Computational Science and Technology, Ho Chi Minh city, Vietnam
In the first part of this talk I shall present our new method for finding the optimal path to pull
a ligand from the binding pocket by the steered molecular dynamics (SMD). The optimal path
corresponds to the minimal hindrance, introduced as a scoring function, to ligand displacement.
Contrary to the existing caver method, our approach takes into account the geometry of ligand
leading to better correlation between experimental inhibition constants and mechanical works estimated by SMD.
In the second part the virtual screening, improved SMD and MM-PBSA methods are applied
to search for potential drugs for the Alzheimer’s disease (AD) from large data bases of natural and
synthesized compounds. The design strategy is based on the amyloid cascade hypothesis which
posits that AD is caused by aggregation of amyloid beta (Aβ) peptides. Oligomers and protofibrils
were taken as drug targets as they seem to be more cytotoxic than mature fibrils. Some of tophit compounds, predicted by our in silico study, have already passed in vitro test for inhibition
activity, blood-brain barrier crossing and non-toxicity to cells. Concerning peptide-based inhibitors,
we have shown that presence of tryptophan and proline residues in tripeptides is crucial for their
tight binding to Aβ fibrils as well as for extensive fibril depolymerization. Fullerenes and their
derivatives were found to have high binding affinity to Aβ and ability to block Aβ aggregation.
The binding free energy linearly scales with the size of fullerenes.
Finally, the application of our method to virtual screening of potential drugs for breast cancer
will be discussed.
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Nasica-Labouze, Mai Suan Li, et al., Chemical Reviews 115, 3518-3563 (2015)
[2] M.H. Viet, K. Siposova, Z. Bednarikova, A. Antosova, T.T. Nguyen, Z. Gazova, and Mai Suan Li, J.
Phys. Chem. B 119, 5145 (2015).
[3] P.D.Q. Huy, and Mai Suan Li, Phys. Chem. Chem. Phys 16, 20030-20040 (2014)
[4] M. H. Viet, C-Y. Chen, C-K. Hu, Y-R. Chen, and Mai Suan Li, Plos One 8(11), e79151 (2013).
[5] P.D.Q. Huy, Y-C. Yu, S.T. Ngo, T.V. Thao, C-P Chen, Mai Suan Li, and Y-C Chen, BBA-General
Subjects 1380, 2960 (2013).
[6] Quan Van Vuong, Tin Trung Nguyen, and Mai Suan Li, J. Chem. Inf. Model. 55, 2731 (2015)
[7] Tin Trung Nguyen et al, Journal of Molecular Modeling (in press).
11
Peptide drug design using CABS-dock web server for protein-peptide docking
Mateusz Kurcinski,1 Maciej Blaszczyk,1 Maciej Pawel
Ciemny,1, 2 Andrzej Koliński,1 and Sebastian Kmiecik1, ∗
1
Faculty of Chemistry, University of Warsaw,
ul. Pasteura 1, 02-093 Warszawa, Poland
2
Faculty of Physics, University of Warsaw,
ul. Pasteura 5, 02-093 Warszawa, Poland
Peptides play essential functional roles in living organisms and have recently attracted much
attention for their potential therapeutic use. Therefore, structural characterization of proteinpeptide interactions is a hot subject of current pharmaceutical research. Computational modeling
of the structure of protein–peptide interactions is usually divided into two stages: (1) prediction
of the binding site at a protein receptor surface, and then (2) docking (and modeling) the peptide
structure into the known binding site. We present a comprehensive CABS-dock method for the
simultaneous search of binding sites and flexible protein–peptide docking [1, 2]. The CABS-dock is
freely available as a user’s friendly web server at http://biocomp.chem.uw.edu.pl/CABSdock/. An
important feature that distinguishes the CABS-dock from other state-of-the-art docking tools is the
ability to account for large-scale rearrangements of selected receptor fragments, and simultaneously
for full peptide flexibility, during explicit docking simulations [1–4]. This makes the CABS-dock
a unique tool for modeling protein-peptide interactions associated with large-scale conformational
changes of both the peptide and protein receptor structures. The talk will outline the CABS-dock
methodology, its unique features and modeling opportunities.
∗
To whom the correspondence should be addressed: [email protected]
[1] M. Kurcinski, M., M. Jamroz, M. Blaszczyk, A. Kolinski, and S. Kmiecik. Nucleic Acids Res., 2015,
43(W1): p. W419-24
[2] M. Blaszczyk, M., M. Kurcinski, M. Kouza, L. Wieteska, A. Debinski, A. Kolinski, and S. Kmiecik.
Methods, 2016, 93: p. 72-83
[3] M. Kurcinski, M., A. Kolinski, and S. Kmiecik. J. Chem. Theory Comput., 2014, 10(6): p. 2224-2231
[4] M. P. Ciemny, M. Kurcinski, K. Kozak, A. Kolinski, S. Kmiecik. Methods Mol. Biol. (in press), 2016,
[arXiv:1605.09303]
12
Point-of-Care (PoC) testing kits. Focus on antibodies
Karina Kubiak-Ossowska1
1
ARCHIE-WeSt, University of Strathclyde, Glasgow, UK
13
Models for error and variants detection in de novo sequenced samples and
communities
Dimitar I. Vassilev1
1
Bioinformatics Group, AgroBio Institute and Joint Genomic Centre, Sofia,
Bulgaria Faculty of Mathematics and Informatics, Sofia University, Bulgaria
Fitting suitable models for discovery of errors and variants in de novo next generation metagenomics sequencing is a difficult task. Even after various tough preprocessing, analyses, chekings the
datasets retain sequences from multiple microbial species which contribute a considerable amount
of variation that conceals the errors. The application of standard denoising algorithms available
for genomics is no longer possible because of the high rate of false positives in regions with natural
variation in the data; at the same time rare natural variants are a subject of study where they
need to be distinguished from the errors.
This work uses both analytical nad machine learning to filter some of the false positives of
other error and variant discovery algorithms applied both in metagenomics and polyploid NGS
studies. A neural network and a random forest have been trained to identify the errors in the
datasets with an accuracy of over 99%. While still insufficient for direct discovery of rare errors, it
is demonstrated that the trained models provide a good filter to reduce the amount of incorrectly
identified errors and decreasing the error/variant ratio without an increase in the false negatives.
The opportunities for implementation of such models and accelerating their accuracy and running time in current development of medicine provides vast beackground for improvement of quality
of diagnosis, therapies, practices, population studies, preventive actions, insurance.
14
Bioinformatics in genetics of rare disorders from molecular diagnostician point
of view
Katarzyna Wertheim-Tysarowska1, ∗
1
Department of Medical Genetics, Institute of Mother and Child
Rare diseases are defined as disorders affecting lower than 1/2000 live births. Majority of them
is genetically determined, which results from DNA mutations. More than 6000 inherited rare
disorders have been described so far, but this number tend to grow.
Human DNA - “molecule of heredity”, contain about 3 billion base pairs which encode around
20 000-30 000 genes (total number of genes has not been established yet). So far over 10 million
point variations (one or few nucleotides changes in DNA sequence) has been identified in human
genome. According to the Human Gene Mutation Database (HGMD), almost 150 000 of them in
about 7000 genes have been proved to be disease-causing.
Changes in DNA sequence can occur during cell divisions and, if they arise during gametes
formation, they are passed down to an offspring. It is estimated that each individual carries
around 60 de novo point variations. Such mutations are further passed down to next generations,
which is one of evolutionary mechanisms responsible for population diversity. Nevertheless, the
functional effect of DNA variants can vary considerably. Majority of them does not influence
cellular processes. However, mutations can also be pathogenic and affect gene expression or protein
synthesis and activity. The final consequence of nucleotide sequence change depends on several
factors i.e. its location, size and type.
The goal of the molecular diagnostics process is to find disease-causing mutations in an affected
person and interpret their clinical significance. Development of bioinformatics over the past few
years has significantly influenced molecular diagnostic procedures. Not only access to information
resources has been provided and simplified, but also dedicated tools facilitating data analysis and
interpretation are constantly being improved. I will present state of art in the field of molecular
diagnostics of rare disorders and the most important bioinformatics solutions that are in routine
usage in this process.
∗
To whom the correspondence should be addressed: [email protected]
15
Comparative studies on proteins. The effect of the wrong assumptions on the
result accuracy and reliability
Jacek Leluk1, ∗
1
Department of Molecular Biology, Faculty of Biological Sciences University of Zielona Góra, Poland
The multiple sequence alignment is the fundamental step of comparative protein and genomic
studies. It is an important intermediate data source leading to obtain consensus sequence defining
the whole protein family, explaining the variability pathways, locating the structurally and functionally significant regions, and many other results, not only limited to the primary structural level.
It is obvious that value and reliability of these results strongly depend on correct adjustment of the
aligned sequences. The related problem is an accurate location of gaps. Otherwise all subsequent
results may be doubtful.
There is a number of algorithms for accomplishing the alignment procedure, which are implemented in many programs. Most of them refer to stochastic matrices of the observed nucleotide/amino acid replacement frequency. A tremendous number of applications comply the
Markovian model of mutational amino acid replacement.
This work demonstrates, that the approaches based on Markovian model and applying stochastic
matrices such as PAM and BLOSUM contain some inconsistencies concerning interpreting the
protein variability occurring in nature [1]. They do not reflect the natural mechanism of molecular
evolution. The methods of gap location and continuity establishment are also not always justifiable
for the homologous proteins. The proposed solution, based on genetic semihomology approach,
takes into account both levels (nucleotide and amino acid) simultaneously, to reflect the natural
evolutionary process (consisting of two components: mutational variability and natural selection)
[2–4]. It applies the three-dimensional diagram of genetic relationships between amino acids instead
of stochastic matrices of the replacement frequency.
In this work there are also discussed the parameters that must be taken into account to evaluate
whether the observed sequence identity/similarity is significant or casual. The problem of the
correlated mutations identification and location in homologous proteins is presented as well [5–7].
∗
[1]
[2]
[3]
[4]
[5]
[6]
[7]
To whom the correspondence should be addressed: [email protected]
J. Leluk, Computers & Chemistry, 2000, 24, 659-672.
J. Leluk, Computers & Chemistry, 1998, 22, 123-131.
J. Leluk, BioSystems, 2000, 56, 83-93.
J. Leluk, B. Hanus-Lorenz, A.F. Sikorski, Acta Biochim. Polon., 2001, 48, 21-33.
J. Leluk, Cell. Mol. Biol. Lett., 2000, 5, 91-106.
Ly Le, J. Leluk, PLoS ONE, http://dx.plos.org/10.1371/journal.pone.0022970, 6(8) e22970, 2011.
R. Filip, J. Leluk, G. Żaroffe, Adv. Biores., 2015, 6, 89-94
16
Machine Learning Trends in RNA Bioinformatics
Jacek Śmietański1, ∗
1
Faculty of Mathematics and Computer Science, Jagiellonian University
Many bioinformatics problems cannot be effectively solved using classical algorithms, therefore more advanced techniques, including artificial intelligence and machine learning (ML) were
introduced.
In short: we classify the algorithm into “machine learning” category when it gives computers
the ability to learn without being explicitly programmed. The key property of such algorithm is
the power to learn from and make predictions on quite small sample data set. Its key feature
(as opposed to classical algorithms) is the generalization ability. The popular bioinformatics tasks
strongly connected to ML techniques are: “classification” and “prediction”. There are for example:
sequences, structures and interactions classification as well as secondary and tertiary RNA structure
prediction, inter- and intramolecular interactions prediction, structure function predictions, and
molecular networks modeling.
For each aforementioned challenge a number of dedicated algorithms has been developed and
implemented. Among a variety of ML algorithms, SVM (Support Vector Machine) and ANN
(Artificial Neural Networks) are most commonly used in bioinformatics tasks. The other like
HMM (Hidden Markov Models), DT (Decision Trees) and RF (Random Forests) are also popular
[1].
The Pubmed [2] search reveals over 500 articles connected witch ML applied to RNA problems,
that were published to date. Over half of them were written within last 3 years (figure 1). That
fact confirm the importance of ML methods in current bioinformatics.
This study presents in details the current needs and trends in ML algorithms applied to RNA
challenges.
FIG. 1.
∗
To whom the correspondence should be addressed: [email protected]
[1] Jensen LJ, Bateman A. The rise and fall of supervised machine learning techniques. Bioinformatics.
2011;27(24):3331-3332. doi:10.1093/bioinformatics/btr585
[2] Roberts RJ (2001). ”PubMed Central: The GenBank of the published literature”. Proceedings of the
National Academy of Sciences, 2001 (98:2): 381–382
17
A Mixture Gaussian Bayesian Graphical Model in application to DNA
microarray segmentation robust to spatial noise
Maciej Sykulski,1, ∗ Boguslaw Kluge,1 and Anna Gambin1
1
Medical University of Warsaw, University of Warsaw
Bayesian graphical models and Gaussian mixtures are useful tools in image analysis, and data
segmentation [1]. A Bayesian graphical model is learned from data using maximum a posteriori
probability (MAP) estimation. We propose a general formulation of a Mixture Gaussian Bayesian
Graphical model, which can be defined on a large graph, for which the Expectation Maximization
algorithm is tractable by solving a corresponding Quadratic Programming problem, and further
optimization using alternating directions. We implement a package in R allowing for declaration
of such models, and their EM optimization benefiting from sparsity of a formulation. We apply
this approach to formulate the Background and Segments Markov random Fields model (BSMF)
defined on two connected graphs: the spatial microarray grid graph, and the genomic linear graph
connecting log2ratio data from a DNA microarray. Estimation of BSMF model results with spatial
denoising of log2ratio data, and with segmentation recovering segments of different Copy Number.
We present results of our approach on real data from aCGH microarrays, compare performance
with Circural Binary Segmentation algorithm [2], analyze sensitivity to setting of prior parameters.
FIG. 1. Factor graph for BSMF(Θ, Z; x, Ω) model. x is log2ratio data. a, b are segmentation, and spatial
Markov random fields respectively. z, y are indicators of breaks in the a, b fields.
∗
To whom the correspondence should be addressed: [email protected]
[1] Zhang, Y. and Brady, M. and Smith, S., Segmentation of brain MR images through a hidden Markov
random field model and the expectation-maximization algorithm, Medical Imaging, 2001, 20, 45–57.
[2] Olshen, A. and Venkatraman, ES and Lucito, R. and Wigler, M., Circular binary segmentation for the
analysis of array-based DNA copy number data, Biostatistics, 2004, 5, 557–572.
18
Application of the Molecular Dynamics Simulations in Health Related Issues
Lukasz Peplowski,1, ∗ Rafal Jakubowski,1 and Wieslaw Nowak1
1
Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University in Toruń
Development of new, innovative and effective drugs are nowadays most desirable research targets
of all pharmaceutical companies. To reach this goal molecular basis of a disease should be found.
Most experiments are very expensive, thus theoretical approaches come into the play. One of
the best methods to acquire dynamical pictures of structural changes of drug targets is molecular
dynamics (MD).
In this we show results of MD simulations focused on (1) drug delivery problem and two healthrelated protein systems: (2) exchange protein directly activated by cAMP 2 (Epac2) [1, 2] and (3)
transthyretin [3] – thyroid hormone transporter.
FIG. 1. Model of cell membrane piercing at t = 0. A SW zigzag carbon nanotube with cisplatin is on the
right side of figure.
(1) The first topic is related to bionanotechnological support in a drug delivery problem. Recognition of a cancer cell and subsequent injection a medicine through the cell membrane can increase
effectiveness of therapy and decrease side effects. Here we present results of MD modeling of a
popular anti-cancer drug cisplatin injection through the membrane (Fig. 1). As a nano-needle a
carbon nanotube was used Simulations show that it is possible to transport the drug in a both
sided opened carbon nanotube through a model cell membrane.
(2) Second topic is hot – optogenetics. We show how, at molecular level, recently developed
photoswitch – JB253 affects Epac2, a protein involved in insulin release from pancreatic cells. It
is possible to control this process through light, using azobenzenes as photoactive molecules. Here
we show structural impact of such light induced switching on Epac2 protein.
(3) The last issue is related with genetics and effects of L55P and V30M mutations on
transthyretin protein—thyroid hormone transporter. Numerous mutations in this protein cause
formation of amyloid fibrils and in effect lead to diseases like Familial Amyloidgenic Neuropathy
or schizophrenia. Our MD molecular calculations and careful analysis show almost unnoticeable,
but in our opinion critical, effects of mutations.
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Broichhagen, M Schönberger, et al., Nat Commun., 2014, 5:5116
[2] K. Herbst, C. Coltharp, , et al., Chem Biol,. 2011, 18:2, 243-51
[3] D. Trivella, L. Bleicher, et al., J. Struct. Biol., 2010,170, 522–531
19
LINCS Molecular Signatures as a Resource for Personalized Precision Medicine
Jarek Meller1, ∗
1
Cincinnati Children’s Hospital Medical Center, Cincinnati, USA
Biomedicine is increasingly becoming data driven and data centric. Multiple concerted efforts
are under way to collect massive amounts of biomedical data, including those that pertain to
personalized precision medicine. The LINCS consortium that stems from the Connectivity Map
and aims to generate a comprehensive library of network-based molecular signatures of chemical
(∼30,000 small drug-like molecules) and genetic (∼20,000 gene knockdowns) perturbations in a
number of cell lines (∼100-1,000 different cell lines) is a prime example of such an effort. Potential
applications of LINCS library in the context of mechanistic studies as well as personalized precision
interventions are discussed, with a special emphasis on proteomic profiles and piLINCS resource
that we developed recently as part of the LINCS/BD2K Data Integration and Coordination Center.
∗
To whom the correspondence should be addressed: [email protected]
20
Posters
Application of new functionalities of RNAComposer in order to improve
prediction accuracy
Maciej Antczak,1, 2, ∗ Mariusz Popenda,2, 3 Tomasz Zok,1, 2 Joanna Sarzynska,2, 3
Tomasz Ratajczak,1, 2 Ryszard W. Adamiak,1, 2, 3 and Marta Szachniuk1, 2, 3
1
Institute of Computing Science, Poznan University of Technology, Poznan, Poland
2
European Center for Bioinformatics and Genomics,
Poznan University of Technology, Poznan, Poland
3
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
The roles which RNA molecules play in many cellular processes are strongly dependent on their
tertiary structure. In contrast to the protein field, a much smaller number of RNA 3D structures
has been assessed by experimental methods and deposited in structural data banks. Therefore,
computational methods designed for tertiary structure prediction of RNAs, are of great importance.
In recent years over a dozen methods devoted to RNA 3D structure prediction have been developed
[1]. Among them, RNAComposer, a fully automated system for 3D structure prediction of large
RNAs has been introduced by our team [2]. Since its inception in 2012, RNAComposer remains one
of the most popular resources continually applied for scientific and academic purposes. However,
an in-depth analysis of 3D models generated by RNAComposer reveals that in a number of cases
the accuracy of prediction might be improved [3]. Thus, we have extended the functionality of our
system to allow the user to apply own structural elements in the prediction process and influence
the search through the database of available RNA 3D structure elements. Moreover, we have
also incorporated three new in silico methods which leads to a greater diversity of resultant RNA
3D models. Introduced functionality contributes to a significant improvement of the predicted
3D model reliability which was observed as a result of the application of RNAComposer for the
modelling of 3D structures of precursors of miR160 family members.
This work was supported by grants from National Science Center, Poland [2012/05/B/ST6/03026,
2012/06/A/ST6/00384].
∗
To whom the correspondence should be addressed: [email protected]
[1] D. Dufour, M.A. Marti-Renom, Software for predicting the 3D structure of RNA molecules, Wiley
Interdisciplinary Reviews: Computational Molecular Science, 2015, 5, 56-61.
[2] M. Popenda, M. Szachniuk, M. Antczak, K.J. Purzycka, P. Lukasiak, N. Bartol, J. Blazewicz, R. W.
Adamiak, Automated 3D structure composition for large RNAs, Nucleic Acids Research, 2012, 40(14),
e112.
[3] Z. Miao, R.W. Adamiak, M.-F. Blanchet, M. Boniecki, J.M. Bujnicki, S.-J. Chen, C. Cheng, G. Chojnowski, F.-C. Chou, P. Cordero, J.A. Cruz, A. Ferre-D’Amare, R. Das, F. Ding, N.V. Dokholyan,
S. Dunin-Horkawicz, W. Kladwang, A. Krokhotin, G. Lach, M. Magnus, F. Major, T.H. Mann, B.
Masquida, D. Matelska, M. Meyer, A. Peselis, M. Popenda, K.J. Purzycka, A. Serganov, J. Stasiewicz,
M. Szachniuk, A. Tandon, S. Tian, J. Wang, Y. Xiao, X. Xu, J. Zhang, P. Zhao, T. Zok, E. Westhof,
RNA-Puzzles Round II: Assessment of RNA structure prediction programs applied to three large RNA
structures, RNA, 2015, 21, 1-19.
22
Modeling inhibitory mechanisms in biological systems using Petri nets
Paulina Boguslawska1, ∗ and Piotr Formanowicz1, 2
1
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland
2
Institute of Bioorganic Chemistry, Polish Academy of Sciences,
Noskowskiego 12/14, 61-704 Poznań, Poland
Petri nets have been described for the first time in 1962 by Carl Adam Petri [4]. Due to their
properties they were widely used to model and analyse technical systems (c.f. [5]). In 1952 British
neurophysiologists Alan Lloyd Hodgkin and Andrew Fielding Huxley were among the first scientists
who use a systems approach to explain a biological process. They have developed a mathematical
model aimed at clearing up the action potential disseminate along the axon of neuronal cell [2].
Currently, Petri nets are increasingly being used to create models of biological systems [3]. An
analysis of such models is based primarily on t-invariants, which correspond to some subprocesses
occuring in the modeled system . In the set of t-invariants similarities are looked for and on the
basis of them some unknown properties of the system can be deduced. Here, methods based on
clustering and MCT sets (maximal common transition sets) are used [1].
In biology some of processes contains feedbacks and inhibitory mechanisms. Concerned about
the accuracy of the modeling of biological processes, it is important to properly describe both of
the mechanisms in the models. Unfortunately, the use of inhibitory arcs makes not possible the
use of the analysis methods based on t-invariants. Therefore, it is necessary to develop methods of
modeling inhibitory mechanisms based on only the basic components of Petri nets. In this work
some ideas for solving this problem are presented.
∗
To whom the correspondence should be addressed: [email protected]
[1] D. Formanowicz, P. Formanowicz, T. Glowacki, A. Kozak, M. Radom, Hemojuvelin– hepcidin axis
modeled and analyzed using Petri nets, Journal of Biomedical Informatics, 2013, 46, 1030-1043.
[2] A. L. Hodgkin, A. F. Huxley, A quantitative description of membrane current and its application to
conduction and excitation in nerve, The Journal of Physiology, 1952, 117(4), 500-544.
[3] Koch I, Reisig W, Schreiber F, editors. Modeling in systems biology: the Petri net approach. London:
Springer; 2011.
[4] C. A. Petri, Kommunikation mit automaten, Institut fur Instrumentelle Mathematik, Bonn, 1962
[5] J.-M. Proth, X. Xie, Petri Nets: A Tool for Design and Management of Manufacturing Systems, John
Wiley & Sons, Inc., 1997.
23
AmyloGram: n-gram analysis and prediction of amyloids
Michal Burdukiewicz,1, ∗ Piotr Sobczyk,2 Pawel Mackiewicz,1 and Malgorzata Kotulska3
1
Department of Genomics, Faculty of Biotechnology, University of Wroclaw
2
Faculty of Pure and Applied Mathematics,
Wroclaw University of Science and Technology
3
Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology,
Wroclaw University of Science and Technology
Amyloids are proteins associated with the number of clinical disorders (e.g., Alzheimer’s,
Creutzfeldt-Jakob’s and Huntington’s diseases). Despite their diversity, all amyloid proteins can
undergo aggregation initiated by 6- to 15-residue segments called hot spots. Henceforth, amyloids
form unique and often zipper-like β-structures, which can turn out harmful [1].
To find patterns defining the hot-spots, we analyzed n-grams (continuous or discontinuous sequences of n elements) extracted from amyloidogenic and non-amyloidogenic peptides collected
in the AmyLoad database [2]. Using 1- to 3-grams with gaps as the input data, we trained random forests as predictors of amyloidogenicity. The results were validated in the cross-validation
procedure, using predictors trained on data sets with peptides of different lengths. The classification efficiency evaluated as specificity was the best for learners trained on the shortest sequences,
whereas predictors based on the longest sequences in the training dataset characterized by greater
sensitivity. Since the amyloidogenicity may not depend on the exact sequence of amino acids
but on more general properties of amino acids in the sequence, we constructed 524 284 reduced
amino acid alphabets of different lengths (three to six letters) based on all possible combinations
of the handpicked physicochemical properties of the amino acids. The cross-validation of predictors employing the different alphabets revealed the best-performing alphabet with the length of
6 amino acid residues. It outperformed also predictors relying on the alphabet based on all 20
amino acids. The reduced alphabet is based on four main properties describing amyloidogenicity:
hydrophobicity, average flexibility indices, polarizability and thermodynamic β-sheet propensity.
We designed a distance measure to compare differences between the alphabets. The correlation
between the distance from the best-performing reduced amino acid alphabet and the obtained AUC
value was negative (-0.44) and significant (p-value smaller than 2.2e-16), which supports our results
that the reduced amino acid alphabets lead to more accurate models of peptides involved in amyloidogenicity. Since n-grams create very large feature spaces, we developed the Quick Permutation
Test (QuiPT) for the selection of the most informative attributes. We found 65 n-grams that are
the most relevant to the discrimination of amyloid and non-amyloid sequences. The aliphatic and
hydrophobic amino acids (isoleucine, leucine, valine) commonly occur in n-grams associated with
amyloidogenicity, while their aromatic counterparts (phenylalanine, tryptophan and tyrosine) are
less frequent. Predictors with the best performance using the alphabets based on all and reduced
amino acids were benchmarked against the most popular tools for amyloid peptides detection using
an external dataset. All forests learned on n-grams outperformed the existing software but only the
predictor based on the best-performing reduced amino acid alphabet, AmyloGram, has obtained
the highest values of performance measures (AUC: 0.90, MCC: 0.63).
AmyloGram is available as a web-server: www.smorfland.uni.wroc.pl/amylogram/.
∗
To whom the correspondence should be addressed: [email protected]
[1] Sawaya, M.R., Sambashivan, S., Nelson, R., Ivanova, M.I., Sievers, S.A., Apostol, M.I., Thompson, M.J.,
Balbirnie, M., Wiltzius, J.J.W., McFarlane, H.T., Nature, 2007, 447, 453–457.
[2] Wozniak, P.P., and Kotulska, M., Bioinformatics, 2015, 31, 3395–3397.
24
Selected aspects of the participation of tobacco smoke in the development of
atherosclerosis modeled and analyzed using Petri nets
Kaja Chmielewska,1, ∗ Dorota Formanowicz,2 and Piotr Formanowicz1, 3
1
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland
2
Department of Clinical Biochemistry and Laboratory Medicine,
Poznan University of Medical Sciences, Grunwaldzka 6, 60-780 Poznań, Poland
3
Institute of Bioorganic Chemistry, Polish Academy of Sciences,
Noskowskiego 12/14, 61-704 Poznań, Poland
Endothelial dysfunction is induced by a various factors, some of them are: high blood pressure,
high glucose level (in diabetes), high low-density lipoprotein level (LDL) and smoking in the wider
sense of the term. For a better understanding of the influence of cigarette smoking on endothelial
damage, stimulating inflammation and prothrombotic states a model of this complex biological
process has been presented. The proposed model has been built using Petri nets [3]. This Petri
net structure includes indirect effect of cigarette smoke on impairment of endothelial function,
caused by among others changing the lipid profile (it leads to increase the amount of LDL and
then to the LDL oxidation, which in result promotes atherosclerosis), decrease the amount of
tetrahydrobiopterin (BH4) (it has influence on inhibition of nitric oxide (NO) synthesis, because
BH4 is a cofactor for endothelial nitric oxide synthase, which is necessary for NO synthesis),
increase the amount of many other important factors, which influence on the development of
unwanted processes (oxidative stress, oxidation of LDL, proliferation of vascular smooth muscle
cells) [1]. In addition, smoking stimulates inflammation and prothrombotic states, which lead
to the development of atherosclerosis. For this reason, the model includes the harmful effects of
macrophages (which stimulates development of plaque) and dual role of NO (which is dependent
on the amount of this molecule). The analysis of the proposed model has been based mainly on
t-invariants. To determine the biological sense of the model, analysis of cluster of t-invariants has
been performed (c.f. [2]).
This research has been partially supported by the Polish National Science Centre grant No.
2012/07/B/ST6/01537.
∗
To whom the correspondence should be addressed: [email protected]
[1] J.A. Ambrose, R.S. Barua, The pathophysiology of cigarette smoking and cardiovascular disease: An
update, Journal of the American College of Cardiology, 2004, 43, 1731-1737.
[2] D. Formanowicz, P. Formanowicz, T. Glowacki, A. Kozak, M. Radom, Hemojuvelin– hepcidin axis
modeled and analyzed using Petri nets, Journal of Biomedical Informatics, 2013, 46, 1030-1043.
[3] T. Murata, Petri Nets: Properties, Analysis and Applications. Proceedings of the IEEE, 1989, 77, 541580.
25
CABS-dock web server for flexible docking of peptides to proteins
Maciej Pawel Ciemny,1, ∗ Mateusz Kurcinski,2 Maciej
Blaszczyk,2 Andrzej Kolinski,2 and Sebastian Kmiecik2
1
Faculty of Physics, University of Warsaw,
ul. Pasteura 5, 02-093 Warszawa, Poland
2
Faculty of Chemistry, University of Warsaw,
ul. Pasteura 1, 02-093 Warszawa, Poland
Protein–peptide interactions play essential functional roles in living organisms and their structural characterization is a hot subject of current experimental and theoretical research. Proteinpeptide molecular docking is a difficult modeling problem, especially if large-scale conformational
changes of the receptor are involved. In this work, we present blind-docking results for proteinpeptide interaction obtained using our CABS-dock method [1–3], which allows for large-scale conformational changes during the on-the-fly docking. While most of the other algorithms require
pre-defined localization of the binding site, CABS-dock does not require such knowledge. Given a
protein receptor structure and a peptide sequence (and starting from random conformations and
positions of the peptide), CABS-dock performs simulation search for the binding site allowing for
full flexibility of the peptide and small fluctuations of the receptor backbone. We present example CABS-dock results obtained in the default CABS-dock mode and using its advanced options
that enable the user to increase the range of flexibility for chosen receptor fragments or to exclude user-selected binding modes from docking search. CABS-do ck web server is available from
http://biocomp.chem.uw.edu.pl/CABSdock/.
∗
To whom the correspondence should be addressed: [email protected]
[1] Kurcinski, M., M. Jamroz, M. Blaszczyk, A. Kolinski, and S. Kmiecik, CABS-dock web server for the
flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res.,
2015, 43(W1): p. W419-24.
[2] Blaszczyk, M., M. Kurcinski, M. Kouza, L. Wieteska, A. Debinski, A. Kolinski, and S. Kmiecik, Modeling
of protein-peptide interactions using the CABS-dock web server for binding site search and flexible
docking. Methods, 2016, 93: p. 72-83.
[3] Kurcinski, M., A. Kolinski, and S. Kmiecik, Mechanism of Folding and Binding of an Intrinsically
Disordered Protein As Revealed by ab Initio Simulations. J. Chem. Theory Comput., 2014, 10(6): p.
2224-2231.
26
Computing maximal cliques on GPU – application to structural alignments
Pawel Daniluk,1, 2, 3, ∗ Tymoteusz Oleniecki,1, 2, 3, † and Grzegorz Firlik1, 2, 3
1
Department of Biophysics, Faculty of Physics,
University of Warsaw, Żwirki i Wigury 93, 02-089 Warsaw, Poland
2
Bioinformatics Laboratory, Mossakowski Medical Research Centre,
Polish Academy of Sciences, Pawińskiego 5, 02-106 Warsaw, Poland
3
College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences,
University of Warsaw, Żwirki i Wigury 93, 02-089 Warsaw, Poland
Computing maximal cliques is a well known NP-complete problem. There are several heuristics
which can be used to solve it, however most are difficult to parallelize in order to be implemented on
GPUs. In this work we present a heuristic based on the Motzkin-Straus theorem [1], which relates
maximal clique problem to finding the maximum of a certain quadratic form. This heuristic
has been proven effective in several studies (e.g. [2]), but an efficient parallel implementation
for graphics processors (GPUs) has not been demonstrated yet [3]. In the presented approach an
iterative procedure based on replicator equations is used to seek local maxima. Optimizations which
prevent converging to local maxima (corresponding to suboptimal cliques) or spurious solutions
(not corresponding to any clique) are also proposed.
The procedure has been implemented in CUDA C. Apart from the usual data parallelism typical
to GPU programming, we have implemented pipelining in order to ensure that the most computationally intensive task (i.e. vector-matrix multiplication) is not interrupted by less significant tasks
(such as vector normalization or node elimination). Altogether we were able to achieve speedup of
24 times over the CPU solution for random graphs having 32 thousand nodes.
We present the proof-of-concept application of the Motzkin-Straus clique finding heuristic to
computing structural alignments of proteins. Local similarities are detected using local descriptors
of protein structure [4]. The alignment is built by finding the largest set of non contradicting local
alignments, which are represented as cliques in a graph. The size of a clique directly corresponds
to the size of the alignment (number of aligned residues or long distance contacts). This method
is an improvement of our DEDAL algorithm [5].
These studies were supported by the research grant (DEC-2011/03/D/NZ2/02004) of the National Science Centre.
∗
†
[1]
[2]
[3]
[4]
[5]
To whom the correspondence should be addressed: [email protected]
Presenting author: [email protected]
T. S. Motzkin, E. G. Straus., Maxima for graphs and a new proof of a theorem of Turán, Canad. J.
Math, 1965, 17.4, 533–540.
I. M. Bomze, M. Budinich, M. Pelillo, C. Rossi, Annealed replication: A new heuristic for the maximum
clique problem, Discrete Applied Mathematics, 2002, 121(1-3), 27–49.
R. Cruz, N. Lopez, C. Trefftz, Parallelizing a Heuristic for the Maximum Clique Problem on GPUs and
Clusters of Workstations, IEEE International Conference on Electro-Information Technology, EIT 2013,
2013.
P. Daniluk, B. Lesyng, Theoretical and Computational Aspects of Protein Structural Alignment In
A. Liwo (Ed.), Computational Methods to Study the Structure and Dynamics of Biomolecules and
Biomolecular Processes Springer Berlin / Heidelberg, 2014, pp. 557–598.
P. Daniluk, B. Lesyng., A novel method to compare protein structures using local descriptors, BMC
Bioinformatics, 2011, 12(1), 344.
27
Epigenetic modifications in RNA: Molecular dynamics studies
Indrajit Deb,1, 2, ∗ Joanna Sarzynska,1 and Ryszard Kierzek1
1
Institute of Bioorganic Chemistry, Polish Academy of Sciences,
Noskowskiego 12/14, 61-704 Poznan, Poland
2
Department of Biophysics, Molecular Biology & Bioinformatics,
University of Calcutta, Kolkata 700009, West Bengal, India
FIG. 1. (A) RNA duplex with PSU and (B) RNA hairpin derived from the m6A-switch in MALAT1 with
m6A modifications.
Epigenetic mechanisms of gene regulations, apart from chemical modifications to DNA, include also modifications to messenger RNA (mRNA) molecules. These modifications are dynamically/reversibly regulated by specific enzymes. The most abundant and epigenetically important
internal modification to RNA is the pseudouridine (PSU) (Figure 1A) and N6-methyladenosine
(m6A) (Figure 1B) [1, 2]. There is a plenty of scope to investigate theoretically and validate
experimentally the structural consequences of these naturally occurring post-transcriptional modifications to RNA. To deal with this kind of problem, one needs an accurate set of force field
parameters for both the standard and modified residues.
The currently available force field parameters for modified RNA residues [3] in AMBER molecular modeling package show significant deviations in conformational properties from experimental
observations [4]. The examination of the transferability of the recently revised torsion parameters
revealed that there was an overall improvement in the conformational properties for some of the
modifications but the improvements were still insufficient in describing the sugar pucker preferences
[4]. Here, we report an approach for the development and fine tuning of the AMBER force field
parameters for modified RNAs. The χ torsion parameters were reparameterized at the individual
nucleoside level. The effect of combining the revised γ torsion parameter [5] and modifying the
Lennard-Jones σ parameters were also tested by directly comparing the conformational preferences obtained from our extensive molecular dynamics simulations with those from experimental
observations [6].
∗
[1]
[2]
[3]
[4]
[5]
[6]
To whom the correspondence should be addressed: [email protected]
S. Schwartz, D. A. Bernstein, M. R. Mumbach et al., Cell, 2014, 159, 148-162.
N. Liu, Q. Dai, G. Zheng et al., Nature, 2015, 518, 560-564.
R. Aduri, B. T. Psciuk, P. Saro et al., J. Chem. Theory Comput., 2007, 3, 1464-1475.
I. Deb, J. Sarzynska, L. Nilsson et al., J. Chem. Inf. Model., 2014, 54, 1129-1142.
A. Perez, I. Marchan, D. Svozil et al., Biophys. J., 2007, 92, 3817–3829.
I. Deb, R. Pal, J. Sarzynska et al., J. Comput. Chem. 2016, DOI:10.1002/jcc/24374.
28
Computational study on hemagglutinin antigenic sites of influenza virus type A
Rafal Filip1, ∗ and Jacek Leluk1
1
University of Zielona Góra, Faculty of Biological Sciences,
Department of Biological Sciences, Zielona Góra, Poland
Hemagglutinin (HA) is a glycoprotein located on the surface of influenza virions. The function
of HA is attachment into infecting host cells, which is one of the key processes in the influenza
virus A replication cycle [1]. HA is also a target for host immune system [2]. Each of the three HA
subunits has five epitopes on the surface described in the literature as Sa, Sb, Ca1, Ca2 i Cb [3, 4].
These antigenic sites as well as entire HA protein are characterized by high level of variability. This
HA’s high mutational rate cause seasonal epidemic and pandemic because the host organism must
obtain a new humoral immunity [5]. In this study we investigated the analysis of the HA antigenic
sites by finding correlated mutations by study of the aligned sequences of hemagglutinins. The
original software (Corm) was used to identify and locate the clusters of correlated positions [6].
There were found several interrelationships between the positions which were not been reported
previously. These data are potentially significant for the explanation concerning the characteristics
of the amino acid mutations in the antigenic sites.
∗
To whom the correspondence should be addressed: [email protected]
[1] D. C. Wiley, J. J. Skehel, The Structure and Function of the Hemagglutinin Membrane Glycoprotein of
Influenza Virus. Annual Review of Biochemistry, 1987, Volume 56, 365-394.
[2] J. J. Skehel, D. C. Wiley, Receptor Binding and Membrane Fusion in Virus Entry: The Influenza
Hemagglutinin. Annual Review of Biochemistry, 2000, Volume 69, 531-569.
[3] S. M. Luoh, M. W. McGregor, V. S. Hinshaw, Hemagglutinin mutations related to antigenic variation
in H1 swine influenza viruses. Journal of Virology, 1992, Volume 66, 1066-1073.
[4] M. Igarashi, K. Ito, R. Yoshida, D. Tomabechi, H. Kida, A. Takada, Predicting the Antigenic Structure
of the Pandemic (H1N1) 2009 Influenza Virus Hemagglutinin. PLoS One, 2010, Volume 5(1), e8553.
[5] W. G. Laver, G. M. Air, R. G. Webster, S. J. Smith-Gill, Epitopes on protein antigens: misconceptions
and realities. Cell, 2015, Volume 61, 553-556.
[6] A. Górecki, J. Leluk, B. Lesyng, Identification and free energy simulations of correlated mutations in
proteins, RECOMB2005, Cambridge MA, USA, Abstracts, 2005.
29
New approach to de-novo genome assembly using string graph fork detection
technique and longest contig path scaffolding method
Wojciech Frohmberg,1, ∗ Michal Kierzynka,1 Piotr
Żurkowski,1 Pawel Wojciechowski,1 and Jacek Blażewicz1
1
Poznan University of Technology
When creating a method for genome assembly there is a considerable temptation to simplify the
problem and forget about the global string graph structure to make use of only a local information
provided by the reads produced by sequencer. After first preprocessing of the data the naive
method could be ready to perform greedy traverse to yield as long reads sequence as possible that
would fulfill the N50 [1] measure criterion of the assembly quality. Consensus alignment of such
a reads sequence probably would not even match to the genome from which the reads was taken
from. This is because of repeating genome fragments which make the string graph structure full
of cycles and parallel paths.
FIG. 1. Complex structure of the genome string graph [2]
This reasoning brought us to consider creating the method that would find only the sure reads
sequences that result in contigs that will undoubtedly match the genome the reads were taken from
and then run further analysis to answer if they could be join without a quality loss. To solve the
problem of finding error-free contigs we proposed methods discovering string graph forks.
The poster is to gather tests comparing our methods to the most commonly used de-novo
genome assemblers. It is also to state a differences of the algorithm at each of its step.
∗
To whom the correspondence should be addressed: [email protected]
[1] J.R. Miller, S. Koren, G. Sutton, Assembly algorithms for next-generation sequencing data, Genomics
95, 315–327, 2010.
[2] C. T. Skennerton, M. Imelfort, and G. W. Tyson, Crass: identification and reconstruction of CRISPR
from unassembled metagenomic data, Nucleic Acids Research, p. 183, 2013.
30
Predicting the age status of somatic mutations by Gaussian mixture
decomposition
Mateusz Garbulowski1, 2, ∗ and Andrzej Polański1
1
Institute of Computer Science, Silesian University of Technology, Gliwice, Poland
2
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Evolution of cancer tissues is explained by the clonal theory. Therefore, estimation of the clonal
structure of cancer cells populations allows for predicting tumor growth properties and its responses
to therapies. Next generation sequencing (NGS) techniques applied to tumor tissues allows for
developing new scenarios for researching evolution of cancer tissues. Reading DNA sequences
of tumor and normal cells and their comparisons allows for discovering sets of somatic driver
mutations, which are related to tumor clonal growth. Numerous studies devoted to development
of methods of analysis of NGS cancer genomics data are appearing in the literature.
In the presented study we show a methodology of analysis of whole exome sequencing (WES)
data oriented towards discovery of driver mutations and estimation of their age status, early—clonal
and late—subclonal. We accept the hypothesis [1], that multiple passenger mutations accumulate
in a cancer cell before a driver mutation causes a clonal expansion. We have designed a system
of WES data analysis of glioblastoma multiforme (GBM) cancer cells, leading to discovering a
lists of driver and passenger somatic mutations. We decompose the obtained histograms of variant
allele frequencies (VAF) into Gaussian mixture of probability distributions and we estimate the
age status of driver and passenger mutations, measured from the time of tumor origination, by the
scaled weights of Gaussian components.
We also show that proportion of amount of subclonal and clonal somatic mutations is a strong
predictor for patients survivals.
∗
To whom the correspondence should be addressed: [email protected]
[1] A. Sottoriva et al., Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics, Proceedings of the National Academy of Sciences, 2013, 110, 4009–4014
31
Complete Petri Net Study of Medically Relevant Interactions in the Immune
System Model
Anna Gogolinska1, 2, ∗ and Wieslaw Nowak2
1
2
Faculty of Mathematics and Computer Science in Toruń, Poland
Institute of Physics, Nicolaus Copernicus University in Toruń, Poland
The immune system (IS) is our defense against pathogens [1]. The immune response is very
complex and it can be divided into many parts and types. The complete study of the whole immune
response, or even its big part is very difficult and requires appropriate methods. Mathematical
models of IS are sought in current systems biology studies. Hopefully such modeling may be useful
in medicine as well.
Petri nets (PNs) [2] are one of the mathematical modeling languages. PN has a form of a
bipartite graph with two kinds of nodes: places and transitions. Transitions can represent actions,
events, and places are suitable for describing objects, elements and they can also contain tokens.
Dynamics in PNs is obtained by transferring tokens by transitions from their input places to the
output places. The PN idea is very simple and very flexible and universal at the same time.
We have used PNs to create a model of human IS [3]. The model represents a large part of
the immune response and it consists of five parts: reaction of DC cell, Th lymphocytes activation,
cellular response, humoral response and the reaction of macrophages. The model is rather comprehensive – it consist of about 100 places and 100 transitions. To our best knowledge it is the most
complex and complete model of the IS up to date. Furthermore the PN formalism allows to modify
previously created Petri nets easily. The model was used to study different pathological : fever,
Autism Spectrum Disorder, HIV infection, Adult-Onset Immunodeficiency Syndrome (AOIS), ageing of the IS. Those phenomena were added in different ways, appropriate to the type of induced
changes in the IS, for example by adding a new part of the network, creating new transitions or
modifying the weights in PNs. For every studied case PN simulations were performed and the
behavior of the model was monitored. Results of time evolutions of some components of IS will be
presented in the poster as selected plots. The whole IS model will be also outlined.
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Parkin and B. Cohen, ”An overview of the immune system,” The Lancet, vol. 357, pp. 1777–1789,
2001.
[2] W. Reisig, Understanding Petri Nets: Springer, 2013.
[3] A. Gogolinska and W. Nowak, ”Petri Nets Approach to Modeling of Immune System and Autism,” in
Artificial Immune Systems. vol. 7597, C. Coello Coello, et al., Eds., ed: Springer Berlin / Heidelberg,
2012, pp. 86–99.
32
Modification of the accessibility of the human soluble epoxide hydrolase active
site, in silico study
Sandra Goldowska,1, ∗ Karolina Markowska,1 Alicja Pluciennik,1 and Artur Góra1
1
Tunneling Group, Biotechnology Centre, Silesian University of Technology,
ul. Krzywoustego 8, 44-100 Gliwice, Poland
Gate can reversibly control the access of various molecules transported by tunnels to and from
the active site of enzyme. The open or closed conformation of a gate might be stabilized by
anchoring residues. This mechanism can be crucial for enzymes selectivity or/and specificity, and
regulate the rate-determining step of catalysis [1].
The aim of this study was to: i) identify gates and anchoring residues in human soluble epoxide
hydrolase; ii) propose and investigate changes which can modify the access to the active site of the
selected enzyme. Active site of human soluble epoxide hydrolase is buried inside protein structure
and connected with the environment by tunnels. Epoxide hydrolases catalyze the conversion of
epoxides to their corresponding diols by water addition. Those enzymes play an important role in
proper functioning of organisms because they are involved in drug metabolism and detoxification
of xenobiotics [2].
To achieve proposed goals and investigate the capabilities of the control of the tunnel dynamics
an in silico study of human soluble epoxide hydrolase was performed. The Amber14 package was
used to run and analyse 50 ns molecular dynamics simulation [3]. Caver 3.02 software was used for
tunnels identification [4]. Gating and anchoring residues detection was performed based on analysis
of amino acids conformations changes. The results of the analysis identified Phe497 residue working
as a gate, which modifies throughput and opening of main tunnels providing access to the active
site and His524 building active site as an anchoring residue.
Rational mutants design aiming modification of the access/exit controlling system was implemented based on the results from in silico study and analysis of the amino acids conservativeness
at preselected positions. Proposed mutations were introduced by FoldX and compilation of MD
simulations and CAVER methods was used to analyse changes in designed variants.
The work is supported by a grant SONATA-BIS 2013/10/E/NZ1/00649 financed by the National
Science Centre Poland (www.ncn.gov.pl)
∗
To whom the correspondence should be addressed: [email protected]
[1] A. Góra, J. Brezovsky, J. Damborsky, Chemical Review, 2013, 113, 5871-5923
[2] R. Thalji, J. McAtee, S. Belyanskaya, M. Brandt, G. Brown, M. Costell, Y. Ding, J. Dodson, S. Eisennagel, R. Fries, J. Gross, M. Harpel, D. Holt, D. Israel, L. Jolivette, D. Krosky, H. Li, Q. Lu, T.
Mandichak, T. Roethke, C. Schnackenberg, B. Schwartz, L. Shewchuk, W. Xie, D. Behm, S. Douglas,
A. Shaw, J. Marino, Bioorganic & Medicinal Chemistry Letters, 2013, 23, 3584-3588
[3] D.A. Case, V. Babin, J.T. Berryman, R.M. Betz, Q. Cai, D.S. Cerutti, T.E. Cheatham III, T.A. Darden,
R.E. Duke, H. Gohlke, A.W. Goetz, S. Gusarov, N. Homeyer, P. Janowski, J. Kaus, I. Kolossvary, A.
Kovalenko, T.S. Lee, S. LeGrand, T. Luchko, R. Luo, B. Madej, K.M. Merz, F. Paesan, D.R. Roe, A.
Roitberg, C. Sagui, R. Salomon-Ferrer, G. Seabra, C.L. Simmerling, W. Smith, J. Swails, R.C. Walker,
J. Wang, R.M. Wolf, X. Wu, P.A. Kollman, AMBER 14, University of California, San Francisco, 2014
[4] E. Chovancová, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M.
Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PloS Computational Biology, 2012,
8(10), e1002708
33
New method of Sholl analysis
Judyta Jablońska1, 2, ∗ and Zbigniew Soltys2
1
Zaklad Neuroanatomii, Instytut Zoologii Uniwersytetu Jagiellońskiego
2
Kolo Naukowe Bioinformatyki Uniwersytetu Jagiellońskiego
The nervous system consist of neuronal and glial cells, which stand out fromothers cellular forms
as they have numerous, more or less branched processes. Neurobiological research uncovered that
the complexity and variability of the cellular morphology influence mode of information processing.
Changes in the structure of such processes are visible under physiological conditions but also
may indicate symptoms of various disorders, from neurogenerative diseases to psychic disturbances.
Until now numerous methods for quantitative analysis of neuronal and glial morphology has
been developed including the oldest but still most commonly used Sholl analysis. Currently a
couple of solutions has been proposed to perform it semi or fully automatically, but each of them
has its own limitations. In order to overcome some of such problems, we propose an idea of the
simple, open-source, fast and automatic method of Sholl analysis, which would be applied to digital,
3D reconstruction of cell shapes. In addition, this method allows us to define a new parameter
describing cell morphology termed ’straightness’.
FIG. 1. Sholl analysis is based on calculating the number of intersections against the radial distance from
the soma centre and results in the graph called Sholl profile.
∗
To whom the correspondence should be addressed: [email protected]
[1] K.E. Binley, W.S. Ng, J.R. Tribble, B. Song and J.E. Morgan (2014) Sholl analysis: a quantitative
comparison of semi-automated methods, Journal of Neuroscience Methods 225: 65–70.
[2] R.C. Cannon, M.-O. Gewaltig, P. Gleeson, U.S. Bhalla, H. Cornelis, M.L. Hines, F.W. Howell, E. Muller,
J.R. Stiles, S. Wils and E. de Schutter (2007) Interoperability of Neuroscience Modeling Software:
Current Status and Future Directions, Neuroinform 5: 127–138.
[3] T.A. Ferreira, A.V. Blackman, J. Oyrer, S. Jayabal, A.J. Chung, A.J. Watt, P.Jesper Sjöström and
van Meyel, Donald J (2014) Neuronal morphometry directly from bitmap images, Nature methods 11:
982–984.
[4] J.C. Gensel, D.L. Schonberg, J.K. Alexander, D.M. McTigue and P.G. Popovich (2010) Semi-automated
Sholl analysis for quantifying changes in growth and differentiation of neurons and glia, Journal of
Neuroscience Methods 190: 71–79.
[5] H. Gutierrez and A.M. Davies (2007) A fast and accurate procedure for deriving the Sholl profile in
quantitative studies of neuronal morphology, Journal of Neuroscience Methods 163: 24–30.
34
Over-minimization in molecular dynamics simulations
R. Jakubowski,1, ∗ J. Rydzewski,1 and W. Nowak1
1
Institute of Physics, Faculty of Physics, Astronomy and Informatics,
Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
Molecular dynamics (MD) is a routinely used method that allows exploration of conformational
landscape of molecules. In case of proteins the protocol which is commonly used includes usually
performing of an energy minimization via geometry optimization. However, there are no clear
indications how many minimization steps one should choose and the tendency is to set up a high
number as most algorithms are computationally cheap these days.
In this work we present results of our research on one of the most popular minimization
algorithms – the conjugate gradient method (CGM). We describe the meaning of the “overminimization” phrase, which indicates an unwanted growth of entropic contributions values during
performing too many steps of CGM calculations. We also provide a new measure based on the
Pareto front of total entropy which facilitates choice of the right structure from the minimization
process in terms of thermodynamic properties.
This work was published in [1].
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Rydzewski, R. Jakubowski, W. Nowak, J. Chem. Phys., 2015, 143, 171103.
35
Influence of the artificial protein nanotube on a cell membrane
Patryk Kosiorek1, ∗ and Lukasz Peplowski1, †
1
Theoretical Molecular Biophysics Group; Institute of Physics; Faculty of Physics,
Astronomy and Informatics; Nicolaus Copernicus University; Grudziadzka 5, 87-100 Torun, Poland
The world of nanotechnology provides us with the information about the qualities of mechanical
nanotubes, both carbon and protein and also miscellaneous influence of the cell membrane. We
can observe multiple researches about clasping, stretching and piercing of the lipid membranes [1].
The aim of this work is to create a new artificial Protein NanoTube (PNT) model based on
the structure present in nature [2, 3], and obtain properties of its interactions with model cell
membrane. Homemade script have been used to build PNT. Model formation relies on the fragment duplication method based on PDB structure. Such model have been used to pierce the cell
membrane and to take the away embedded PNT out of it.
Our simulations shows that PNT can pierce cell membrane with force about 1800 pN, but in
the last phase of membrane piercing protein starts to unfold. When embedded PNT is removed
from membrane, forces applied are smaller — about 1000 pN. In such simulations protein preserve
native conformation.
The entire experiment have been performed using the steered molecular dynamics simulations
[4], using the NAMD 2.10 code [5] and the CHARMM force field [6].
∗
†
[1]
[2]
[3]
[4]
[5]
[6]
To whom the correspondence should be addressed: [email protected]
To whom the correspondence should be addressed: [email protected]
R.Garcia-Fandino, J.L.Trick, A.Pineiro, M.S.P.Sansom, ACSNano, 2016, Volume 10, 3693–3701.
S.Heten, M.J.Buehler, Volume 197, 2008, Pages 3203–3214.
K.Brown, F.Pompeo, S.Dixon, D.Mengin-Lecreulx, Ch.Cambillau, Y.Bourne, 1999, Volume 18, 4096–
4106.
H.Lu, B.Isralewitz, A.Krammer, V.Vogel, K.Schulten, 1998, Biophys J., 75, 662–71.
J.C.Phillips, R.Braun, W.Wang, J.Gumbart, E.Tajkhorshid, E.Villa, Ch.Chipot, R.D.Skeel, L.Kale, and
K. Schulten. J.Comp Chem, 2005, 26:1781–1802.
A.D.MacKerell, Jr.,B.Brooks, C.L.Brooks, III, L. Nilsson, B. Roux, Y. Won, and M. Karplus The
Encyclopedia of Comp. Chem, 1998, 271-277.
36
Searching for structural patterns in the vicinity of microRNA in plants
Joanna A. Kowalska,1, ∗ Katarzyna Tomczyk,1 Agnieszka
Mickiewicz,2 Joanna Sarzynska,2 and Marta Szachniuk1, 2
1
Institute of Computing Science & European Centre for
Bioinformatics and Genomics, Poznan University of Technology
2
Institute of Bioorganic Chemistry PAS
Evolutionarily ancient, about 19-24nt molecule of microRNA (miRNA) has been attracting
scientists over the last few decades. So far, it has been recognized as a regulator of gene expression
on post-transcriptional level. In plants, miRNA plays a key role in the response to stress conditions,
such as sudden changes of temperature, drought or nutrient deficiency, and it participates in the
process of growth and development.
Biogenesis of both plant and animal miRNA starts in nucleus. Hairpin loop structure of animal
pre-microRNA is exported to cytoplasm and there it is cleaved to the form of miRNA-miRNA*
duplex. In contrary, plant miRNA is exported to cytoplasm after creating the duplex. The other
significant difference between miRNA in animals and plants are enzymes mediating the process
of miRNA maturation. Dicer, animal RNase III enzyme, serves as ’molecular ruler’ in animals.
After recognition the 5’ end and/or the 2-nt 3’ overhang, Dicer measures the distance of 22nt
and performs a cleavage of the duplex [1]. Dicer Like 1 (DCL1) being a homologue of Dicer is
responsible for cutting out the miRNA-miRNA* duplex in plants. The mechanism of action of
the latter enzyme still remains unclear. Preliminary analysis of sequence, secondary and tertiary
structures could help discover how plant microRNAs are recognized by an enzyme within their
precursors.
Herein, we present a bioinformatics approach aimed to support an understanding of plant microRNA biogenesis. By using the set of available bioinformatics tools, i.a. WebLOGO [2], RNAstructure [3], MCQ4Structures [4], RNAComposer [5], Swiss PDB Viewer [6], and own scripts, we
try to identify structural patterns in the vicinity of miRNA in plants. In search for patterns we
consider the first, the secondary and the tertiary structures of available pre-miRNAs, focusing on
the neighborhood of miRNA-miRNA* duplex. We present potential motifs identified in sequence
and secondary structure. We also demonstrate the first results of tertiary structure analysis based
on predicted 3D models of miRNA precursors.
∗
To whom the correspondence should be addressed: [email protected]
[1] Ha M, Kim N, Regulation of microRNA biogenesis, Nature Reviews Molecular Cell Biology, 2014, 15,
509–524,
[2] Crooks GE, Hon G, Chandonia JM, Brenner SE, WebLogo: A sequence logo generator, Genome Research, 2004,14, 1188-1190,
[3] Reuter J S, Mathews D H, RNAstructure: software for RNA secondary structure prediction and analysis,
BMC Bioinformatics, 2010, 11,
[4] Zok T, Popenda M, Szachniuk M, MCQ4Structures to compute similarity of molecule structures, Central
European Journal of Operations Research, 2014, 22, 457-473,
[5] Popenda M, Szachniuk M, Antczak M, Purzycka K J, Lukasiak P, Bartol N, Blazewicz J, Adamiak R
W, Automated 3D structure composition for large RNAs, Nucleic Acids Research, 2012, 14,
[6] Guex N, Peitsch MC, SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative
protein modeling, Electrophoresis 18, 1997, 15, 2714-2723.
37
PyRosetta energy terms as indicators for protein mirror models
Monika Kurczyńska,1, ∗ Bogumil M. Konopka,1 and Malgorzata Kotulska1
1
Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology,
Wroclaw University of Science and Technology, Wroclaw, Poland
The number of the protein sequences in May 2016 was 64,000,000, which was 21-times higher
than 10 years ago [1]. In the same time the number of the protein structures in the Protein Data
Bank [2] increased only 3-fold and nowadays equals 110,000. To decrease the disparity between the
primary and tertiary protein structures the protein structure modelling methods are developing.
Tools for protein structure reconstruction from a contact map generate a model collection containing properly oriented models and mirror models, because all of them share the same contact map.
Properly oriented protein models and mirror models can constitute competitive forms in nature.
Our main goal is to identify the indicators which could be useful in distinction the mirror models
without a priori knowledge about the structure. We assumed that some of the PyRosetta energy
terms will be significantly different for mirror models than for properly oriented models.
In our work we used protein models which were reconstructed from contact maps of experimental
SCOP domains [3] with our tool - C2S pipeline [4]. The original SCOP domains are organized in
classes based on the similarity of their secondary structures. We investigated 100 models for each
of 1305 domains. With Biopython [5] we calculated structural features of the models and with
PyRosetta [6] we computed the energy terms, whose linear combination is the total energy of the
model.
C2S pipilne generates mirror models and properly oriented models with the same probability of
0.5. However, for some domains the percentage of mirror models was lower than 5% or higher than
95%. The structural quality of the properly oriented models and mirror models is comparable. The
mean RMSD of the properly oriented models compared, to the original SCOP structures, equaled
5.6 Å with standard deviation of 5.4 Å , while the mean RMSD mirror models compared to the
ideal mirror images of the original SCOP structure was 5.6 Å with standard deviation of 5.5 Å. In
all-alpha domains the energy term which describes electrostatic energy (hack elec) offered the most
reliable indicator between properly oriented and mirror models (for 77% domains). Simultaneously,
the energy terms related to the probability of amino acid at dihedral angles Ψ and Θ (p aa pp) and
with the Ramachandran preferences (rama) were statistically different for 68% and 64% domains.
Despite the intuition that the mirror images of the protein riches in alpha-helices are easier to
identify, we observed more energy terms which were significantly different for more than 75%
domains. These energy terms were also rama and p aa p, additionally the attractive and repulsive
portions of the Lennard-Jones potential (fa atr, fa rep) and Lazaridis-Karplus solvation energy
(fa sol).
∗
To whom the correspondence should be addressed: [email protected]
[1] The UniProt Consortium, Nucleic Acids Res, 2015, 43, D204-D212.
[2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne,
Nucleic Acids Res, 2000, 28, 235-242.
[3] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, J Mol Biol, 1995, 247, 536–540.
[4] B.M. Konopka, M. Ciombor, M. Kurczynska, M. Kotulska, J Membr Biol, 2014, 247, 409–420.
[5] P.J. Cock, T, Antao, J.T. Chang, B.A. Chapman, C.J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F.
Kauff, B. Wilczynski, M.J. de Hoon, Bioinformatics, 2009, 25, 1422–1423.
[6] S. Chaudhury, S. Lyskov, J.J. Gray, Bioinformatics, 2010, 26, 689–691.
38
Application of molecular docking for predicting protein substrate specificity
Michal Laźniewski,1, 2, ∗ Krzysztof Kuchta,1 Dariusz Plewczyński,3 and Krzysztof Ginalski1
1
Laboratory of Bioinformatics and Systems Biology,
Centre of New Technologies, University of Warsaw, Poland
2
Department of Physical Chemistry, Faculty of Pharmacy, Medical University of Warsaw, Poland
3
Laboratory of Functional and Structural Genomics,
Centre of New Technologies, University of Warsaw, Poland
In recent years, the application of next generation sequencing brought a substantial increase in
the number of known protein sequences. The Refseq database alone contains now more than 60
million unique sequences, 50 times more than only ten years ago. For such an overwhelming amount
of data, functional assignment using only experimental techniques is a challenging task. With
structural genomics providing thousands of new protein structures [1], the additional information
might be utilized by structure-based methods to improve the quality of bioinformatics predictions.
Thus molecular docking might prove invaluable where other theoretical techniques fail to predict
the function of an analyzed protein.
It has been already shown that molecular docking can frequently predict the correct conformation of a small compound in a protein-ligand complex; however, calculating the binding energy
remains a challenge [2]. On the other hand, with most of the efforts focused on studying the
interactions between proteins and drug-like inhibitors, there only has been rather limited emphasis put on analyzing the proteins and their in vivo small molecule partners [3]. The goal of our
work was a comprehensive analysis of the performance of various molecular docking algorithms
in wide-scale predictions of substrate specificity of proteins from a single organism – Escherichia
coli. We analyzed all E. coli enzymes for which the crystal structure was solved together with
their cognate partner (the substrate or product of the reaction). Specifically, two sets of molecules
where docked to each enzymes: (a) compounds present in the active sites of all selected proteins
and (b) the entire metabolome of E. coli. The performance of four different programs (GOLD,
eHiTs, Surflex and Glide) were tested, including both the ability to identify the enzyme’s cognate
ligand and to correctly predict the ligand conformation. Moreover, we discuss if applying machine
learning methods could further increase the quality of such structure-based functional predictions.
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Weigelt, Exp cell res, 2010, 316(8), 1332-1338.
[2] D. Plewczynski, M. Lazniewski, et al., J Comput Chem, 2011, 32(4), 742-755.
[3] A. Macchiarulo, I. Nobeli, et al., Nat Biotechnol, 2005, 22(8), 1039-1045.
39
PANTA RHEI: analysis of solvent flow in MD simulations
Tomasz Magdziarz,1, ∗ Karolina Markowska,1 Sandra Goldowska,1 and Artur Góra1
1
Tunneling Group, Biotechnology Centre, Silesian University of Technology, 44100 Gliwice, Poland
Proper actions of enzyme proteins are enabled by multitude of factors, for example, temperature,
pH, or solvent, which is usually constituted by water plus some ions. The solvent is especially an
interesting factor because it contributes to catalytic stability, activity and selectivity of the proteins.
Natural evolution developed a variety of mechanisms regulating water access to the active site. The
division of the hydrophobic and hydrophilic compartments in protein cores can separate processes
requiring distinct dielectric conditions. In enzymes with the buried active site, connected with
surrounding solvent by tunnels, the water flow can be controlled by molecular properties of amino
acids constituting tunnels or in more sophisticated enzymes by gates controlling the opening and
closing of the access pathways. This detailed information about the molecules flow through the
tunnel network can significantly improve our understanding of mechanisms controlling enzyme
activity.
In past years several tools for tunnels identification were developed. The most recent like
CAVER 3.0 [1] or Mole 2.0 [2] can facilitate analysis of molecular dynamic simulations and allows
to gather precise information about the geometry of detected pathways and their prolongation in
time.
However, the knowledge of geometrical properties of existing tunnels approximated by spherical
balls penetrating empty space in proteins can only suggest ways of solvent molecules (and ligands)
entry/exits. Parameters like the length of a tunnel, its diameter and even properties of amino acids
that build the tunnel do not allow for easy identification of the major factors controlling the flow
of water molecules.
Here we present a novel tool for analysis of solvent flow in molecular dynamic simulations.
AQUEDUCT, software package developed in our group, allows extraction, analysis and visualization of the behavior of solvent molecules during the entire simulation. Enzymes in explicit solvent
MD simulations are immersed in a kind of water box counting thousands of water molecules. Analysis of this bulk of water can provide, for example, insight in overall distribution and density of
water molecules. AQUEDUCT, on the other hand, has different approach. It traces particular
water molecules that enter or interact with the active site. This leads to the complete picture of
solvent flow from, to, and around the active site. Moreover it allows for better understanding of
factors that control the flow of water molecules and their impacts on enzymes activity.
Applications of AQUEDUCT are not limited to water molecules only. It is a universal tool and
together with previously described tools can provide a complex description of the protein tunnels
network and their accessibility/usage by different molecules.
The work is supported by National Science Centre Poland grant SONATA-BIS 2013/10/E/NZ1/00649.
∗
To whom the correspondence should be addressed: [email protected]
[1] E. Chovancova, A. Pavelka, P. Benes, O. Strnad, J. Brezovsky, B. Kozlikova, A. Gora, V. Sustr, M.
Klvana, P. Medek, L. Biedermannova, J. Sochor, J. Damborsky, PLoS Comput Biol, 2012, 8, e1002708
[2] D. Sehnal, R. Svobodová Vařeková, K. Berka, L. Pravda, V. Navrátilová, P . Banáš, C.-M. Ionescu, M.
Otyepka, J. Koča, J Chemoinform, 2013, 5, 1-13
40
NGS-based analysis of copy number variations in various cattle breeds
M. Mielczarek,1, 2, ∗ M. Fraszczak,1 E. L. Nicolazzi,3 G. Minozzi,3 H. Schwarzenbacher,4 C.
Egger-Danner,4 D. Vicario,5 F. Seefried,6 A. Rossoni,7 T. Solberg,8 L. Varona,9 C.
Diaz,9, 10 C. Ferrandi,3 R. Giannico,3 J. L. Williams,11 J. Woolliams,12 and J. Szyda1, 2
1
Biostatistics group, Wroclaw University of Environmental and Life Sciences; Wroclaw, Poland
2
National Research Institute of Animal Production; Cracow-Balice, Poland
3
Fondazione Parco Tecnologico Padano; Lodi, Italy
4
ZuchtData EDV-Dienstleistungen GmbH; Vienna, Austria
5
Italian Simmental Cattle Breeders Association; Udine, Italy
6
Swiss Brown Cattle Breeders Federation; Zug, Switzerland
7
Italian Brown Cattle Breeders‘ Association; Bussoleng, Italy
8
Norwegian University of Life Sciences; As, Norway
9
Universidad de Zaragoza; Zaragoza, Spain
10
Instituto Nacional de Investigaciòn Agropecuaria; Madrid, Spain
11
University of Adelaide; Roseworthy, Australia
12
Roslin BioCentre; Roslin, UK
Whole genome DNA sequences were determined for 104 bulls representing Brown Swiss (48
individuals) Fleckvieh (30), Guernsey (20), Simmental (16) and Norwegian Red (23) breeds. The
total number of raw reads obtained for a single animal varied between 270,678,710 (a Fleckvieh)
and 768,980,700 (a Brown Swiss). The average genome coverage ranged from 10 to 28. Alignment
to the UMD3.1 reference genome was carried out using BWA-MEM. CNV calling was performed
with the CNVnator software. The number of duplications per individual varied between 2,204
and 48,501, while the number of deletions varied between 9,771 and 24,334. Six deletions located
on chromosomes 6, 7, 17 ,21, 23 and 29 were shared among all animals, while there were no
duplications shared among all of them. Length of CNVs varied from 200 bp to 999,300 bp for
deletions, and from 200 bp to 925,000 bp for duplications. In conclusion a significant variation in
genome structure is observed both within as well as among breeds.
The research was carried out within the EU ”Gene2Farm” project (7FP grant No. 289592) and
Polish National Science Centre grant No. UMO-2014/15/N/NZ9/03914. Processing of the raw
data was performed at the Poznan Supercomputing and Networking Center.
∗
To whom the correspondence should be addressed: [email protected]
41
Theoretical study of influence of mutations on superoxide dismutase SOD1
dimer by molecular dynamics simulations
Przemyslaw Miszta,1, ∗ Cezary Żekanowski,2 Jakub
Fichna,2 Michalina Kosiorek,2 and Slawomir Filipek1
1
Faculty of Chemistry & Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
2
Department of Neurodegenerative Disorders, Mossakowski Medical Research Centre,
Polish Academy of Sciences, Warszawa, Poland
The human superoxide dismutase [Cu-Zn] also known as superoxide dismutase 1 or SOD1 plays
a very important role in the regulation of apoptosis cells. It is an enzyme that in humans is encoded
by the SOD1 gene, located on chromosome 21. Mutations in this gene have been implicated as
causes of familial amyotrophic lateral sclerosis (fALS). The mechanism by which mutant SOD1
exerts toxicity remains unknown. SOD1 is an important antioxidant defense in nearly all living
cells exposed to oxygen.
The structures of human SOD1, involving dimer, which contains four binding sites for metal
ions: two for Zn2 + and two for Cu2 +, were taken from the X-Ray structure from the Protein Data
Bank (PDB id:2C9V) [1]. The Zn and Cu ion-binding site is built by the following amino acids:
His44, His46, His61, His118, His69, His78, Asp81. The force field parameter and charges for the
Zn-Cu binding site was created by the analogy to data obtained from previous quantum chemical
calculations [2, 3]. All energy minimizations (10 000) and molecular dynamics (MD) simulations
were performed in the NAMD program version 2.10 using an all-atom (37 000 atoms) force field
CHARMM27 [4] in a periodic box (62x95x62Å) and Langevin (stochastic) dynamics [5].
The following mutations: K3E, A4V, G41S, G72S, N86S, D90A, G93C, S105L, C11Y, N139D,
L144S, L126X and the native protein were investigated by molecular dynamics simulations. The
study of changing the mutated SOD1 structure during 20 ns all-atom MD simulations in explicit
water environment were performed. Comparison of superimposed resulted structures, mutants
with WT protein, revealed how each mutation could influence the structure of SOD1 dimer and
also how the potential interactions with other proteins could be altered.
∗
To whom the correspondence should be addressed: [email protected]
[1] Strange, R.W., Antonyuk, S.V., Hough, M.A., Doucette, P.A., Valentine, J.S., Hasnain, S.S., J.Mol.Biol.
2006, 356, 1152
[2] Shen, J., Wong, C.F., Subramaniam, S., Albright, T.A., and McCammon, J.A., J. Comp. Chem. 1990;11:
346–350
[3] Branco, RJF; Fernandes, PA; Ramos, MJ, J. Phys. Chem. B, 2006, 110, 16754.
[4] MacKerell, Jr. AD, Banavali N, Foloppe N, Biopolymers, 2001, 56, 257–265.
[5] R. Kubo, M. Toda, N. Hashitsume, Statistical Physics II: Nonequilibrium Statistical Mechanics, Springer,
1991.
42
Transcriptomic analysis of gene expression data from Bos taurus liver using
RNA-Seq
Pareek C.S.,1, 2, ∗ Walendzik Paulina,1, 2, † Kadarmideen Haja,3 and Kogelman Lisette3
1
Functional Genomics Lab. Faculty of Biology and Environmental Protection,
Nicolaus Copernicus University, Torun, Poland
2
Interdisciplinary Centre of Modern Technology,
Nicolaus Copernicus University, Torun, Poland
3
The Animal Breeding, Quantitative Genetics and System Biolog Lab., University of Copenhagen, Denmark
RNA-Seq is a relatively novel technology that can be used to analyze the changes in gene
expression across the entire transcriptome and has been applied to an intense increasing number
of organisms [1]. In this study we have used massive, parallel high-throughput transcriptome
sequencing (RNA-seq) technologies to characterize the bovine liver transcriptome architecture in
three cattle breeds at three developmental ages. Bovine liver tissue is the main organs engaged in
the regulation of metabolism. Especially in the overall both muscle and body growth development
in young growing bulls. The bioinformatics analysis was performed using the PARTEK Flow
suite and R-software with specific packages, to identify significantly differentially expressed genes
(SDE) genes and associated over-represented Gene Ontology (GO) terms and Kyoto Encyclopedia
of Genes and Genomes (KEGG) pathways across the whole liver transcriptome of cattle. Included
we obtained 455 differentially expressed genes from the RNA-Seq data. Detailed analysis from
the Venn diagram in overlapping genes of the Hereford breed between 9-month and 12-months,
identified two significant genes correlated with fat metabolism: gene FADS2 (fatty acid desaturase
2) and gene FASN (fatty acid synthase. The significant results from GO term was obtained in just
one analysis, namely within the Hereford breed comparing 9 vs 12 months. Detailed GO analysis
of Hereford 9 vs 12 months, identified nine biological pathways and five from them are associated
with fat metabolism. Similarly, the detailed KEGG analysis of Hereford 9 vs 12 months resulted in
identification of nine pathways, of which 4 were associated with fat metabolism. The results also
indicate that the comprehensive identification and annotation of unknown transcripts from tissue
specific transcriptome analysis using RNA-seq data remains a tremendous future challenge.
Supported by the National Science Centre, Krakow, Poland (Project No. 2012/05/B/NZ2/01629).
This master thesis during the three-month of traineeship was realized partially at the University
of Copenhagen in the Animal Breeding, Quantitative Genetics and System Biology Group under
the leadership of Profesor Haja Kadarmideen.
∗
To whom the correspondence should be addressed: [email protected]
Presenting author: [email protected]
[1] McCabe M., Waters S., Morris D., Kenny D., et al., (2012), RNA-seq analysis of differentia gene expression in liver from lactating dairy cows divergent in negative energy balance, BMC Genom., 20:193.
†
43
The evolutionary glance at tunnels in proteins
Alicja Pluciennik,1, ∗ Michal Stolarczyk,1 Sandra Goldowska,1
Magdalena Lugowska,1 Tomasz Magdziarz,1 and Artur Góra1
1
Tunneling Group, Biotechnology Center, Silesian University of Technology,
ul. Krzywoustego 8, 44-100 Gliwice, Poland
Enzymes acquire many strategies of catalysis. One of them is a hidden active site connected
to external protein environment via a tunnels. The properties of amino acids which form buried
pathway have important role in regulation of ligand passage and binding and in the catalytic
properties [1]. Also changes of residue conformation, like in case of gating and anchoring amino
acids can considerably influence the substrate or ligand flow to and from the active site [2].
The conservation of amino acids in protein families allows to predict hot spots for protein
engineering. The highly conserved amino acids according to Kimura’s neutral theory of molecular evolution play significant role in protein stability and functionality. However, low values of
conservation can point residues responsible for adaptation of enzyme to improve its specificity.
Therefore the variability or conservation of protein tunnels provides us with insight into the
mechanism enzymes selectivity or specifity. Performed analysis aims to link evolutionary history
of residues and their function. Thus, the deeper insight into residues evolutionary status, the more
possibilities of obtain desired enzyme properties.
The work is supported by a grant SONATA-BIS 2013/10/E/NZ1/00649 financed by The National Science Centre Poland (www.ncn.gov.pl).
∗
To whom the correspondence should be addressed: [email protected]
[1] L. Biedermannová, Z. Prokop, A. Gora, E. Chovancová, M. Kovács, J. Damborský, and R. C. Wade, A
Single Mutation in a Tunnel to the Active Site Changes the Mechanism and Kinetics of Product Release
in Haloalkane Dehalogenase LinB, 2012, J. Biol. Chem., vol. 287, no. 34, pp. 29062–29074.
[2] A. Gora, J. Brezovsky, and J. Damborsky, Gates of Enzymes, 2013, Chem. Rev., vol. 113, no. 8, pp.
5871–5923.
44
Redundans: an assembly pipeline for highly heterozygous genomes
Leszek P. Pryszcz1, 2, ∗ and Toni Gabaldón1, 3, 4
1
Bioinformatics and Genomics Programme. Centre for Genomic
Regulation (CRG). Dr. Aiguader, 88. 08003 Barcelona, Spain
2
International Institute of Molecular and Cell Biology, Warsaw, Poland
3
Universitat Pompeu Fabra (UPF). 08003 Barcelona, Spain
4
Institució Catalana de Recerca i Estudis Avançats (ICREA),
Pg. Lluı́s Companys 23, 08010 Barcelona, Spain
Many genomes display high levels of heterozygosity (i.e. presence of different alleles at the same
loci in homologous chromosomes), being those of hybrid organisms an extreme such case. The assembly of highly heterozygous genomes from short sequencing reads is a challenging task because
it is difficult to accurately recover the different haplotypes. When confronted with highly heterozygous genomes, the standard assembly process tends to collapse homozygous regions and reports
heterozygous regions in alternative contigs. The boundaries between homozygous and heterozygous regions result in multiple paths that are hard to resolve, which leads to highly fragmented
assemblies with a total size larger than expected. This, in turn, causes numerous problems in
downstream analyses i.e. fragmented gene models, wrong gene copy number, broken synteny.
To circumvent these caveats we have developed a pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative
heterozygous contigs. We tested our pipeline on simulated and naturally-occurring heterozygous
genomes and compared its accuracy to other existing tools. Our method was recently published
[1] and it is freely available at https://github.com/lpryszcz/redundans.
∗
To whom the correspondence should be addressed: [email protected]
[1] L. P. Pryszcz, T. Gabaldón, NAR, 2016.
45
Ranking of RNA models Using Sphere Consensus
Tomasz Ratajczak1, ∗ and Piotr Lukasiak1
1
Institute of Computing Science, Poznan University of Technology, 60-965 Poznan, Poland
During the process of modeling unknown RNA molecule, for which no reference structure is
available, various methods and tools can be used. Most of those tools will generate set of different
models, but rather rarely they provide score that can be used to select the best model. Even if
the score is available, the selection of the highest quality structure is usually done through manual
inspection.
Here we present an automatic method to rank models of a single unknown reference structure
based on consensus approach. In our methodology RNAssess method [1] is used to identify conservative regions of a molecule at various precision levels capturing both local and global motifs. For
each nucleotide a sphere of specified radius is calculated and all atoms inside the sphere selected.
Substructure consisting of those atoms is compared between all models. Candidate models containing more common motifs are ranked higher than models with less common structures. Using
set of sphere radii, a local and global comparison is performed.
Assuming that models are generated independently, common motifs can indicate properly predicted regions. The method provides a uniform score that can be used to rank RNA 3D models of
the same RNA sequence without knowledge about its 3D structure.
∗
To whom the correspondence should be addressed: [email protected]
[1] P. Lukasiak, M. Antczak, T. Ratajczak, M. Szachniuk, M. Popenda, R.W. Adamiak, J. Blazewicz, NAR,
2015, Volume 43, W502-W506
46
Tabu Search Algorithm for RNA Partial Degradation Problem
Agnieszka Rybarczyk,1, 2, ∗ Alain Hertz,3 Marta Kasprzak,1, 2 and Jacek Blażewicz1, 2
1
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
2
Institute of Bioorganic Chemistry, Polish Academy of Sciences,
Noskowskiego 12/14, 61-704 Poznań, Poland
3
Department of Mathematics and Industrial Engineering,
Ecole Polytechnique and GERAD, Montreal, Canada
In the last few years, there has been observed a great interest in the RNA research due to the
discovery of the role that RNA molecules play in the biological systems. They do not only serve
as a template in protein synthesis or as adaptors in translation process but also influence and are
involved in the regulation of gene expression. It was demonstrated that most of them are produced
from the larger molecules due to enzyme cleavage or spontaneous degradation.
In this work, we would like to present our recent results concerning the RNA degradation process.
In our studies we used artificial RNA molecules designed according to the rules of degradation
developed by Kierzek and co-workers [1, 2]. On the basis of the results of their degradation, we
have proposed the formulation of the RNA Partial Degradation Problem (RNA PDP) and we have
shown that the problem is strongly NP-complete [2]. We would like to propose a new efficient
heuristic approach, in which two tabu search algorithms cooperate. The algorithm can reconstruct
a given RNA molecule, having as input the results of the biochemical analysis of its degradation,
which possibly contain errors (false negatives or false positives). Results of the computational
experiment, which prove the quality and usefulness of the proposed method, are presented.
∗
To whom the correspondence should be addressed: [email protected]
[1] R. Kierzek, Methods Enzymol., 2001, 341, 657-75.
[2] J. Blazewicz, M. Figlerowicz, M. Kasprzak, M. Nowacka, A. Rybarczyk, Journal of Computational
Biology, 2011, 18, 821-834.
47
Conformational sampling of a biomolecular rugged energy landscape
J. Rydzewski,1, ∗ R. Jakubowski,1 G. Nicosia,2 and W. Nowak1
1
Institute of Physics, Faculty of Physics, Astronomy and Informatics,
Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
2
Department of Mathematics and Computer Science,
University of Catania, Viale A. Doria, 6-95125 Catania, Italy
The protein structure refinement using conformational sampling is important in hitherto protein
studies. In this paper we examined the protein structure refinement by means of potential energy
minimization using immune computing as a method of sampling conformations. The method
was tested on the x-ray structure and 30 decoys of the mutant of [Leu]Enkephalin, a paradigmatic
example of the biomolecular multiple-minima problem. In order to score the refined conformations,
we used a standard potential energy function with the OPLSAA force field. The effectiveness of the
search was assessed using a variety of methods. The robustness of sampling was measured by the
energy yield function which measures quantitatively the number of the peptide decoys residing in
an energetic funnel. Furthermore, the potential energy-dependent Pareto fronts were calculated to
elucidate dissimilarities between peptide conformations and the native state as observed by x-ray
crystallography. The following conclusions can be drawn from the foregoing discussions: (i) our
results suggest that the potential energy landscape has a very rugged nature and is perhaps selfsimilar, i.e., has similar character on a different metric scales [1]; (ii) the potential energy changes
implicated by the small-scale movements are unphysically large. This fact is perhaps related to
an analytical form of force fields whose multimodality causes the ruggedness of potential energy
landscapes [2].
J. Rydzewski would like to acknowledge financial support from The National Science Centre,
Poland (grant
2015/19/N/ST3/02171).
›
∗
To whom the correspondence should be addressed: [email protected]
[1] R. Elber and M. Karplus. Science 235(4786), 318–321, 1987.
[2] J. Higo, N. Ito, M. Kuroda, S. Ono, N. Nakajima and H. Nakamura. Prot. Sci. 10(6), 1160–1171, 2001.
48
Ligand diffusion pathways in cytochrome P450cam
J. Rydzewski1, ∗ and W. Nowak1
1
Institute of Physics, Faculty of Physics, Astronomy and Informatics,
Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland
Computational simulations in molecular biophysics describe in atomic detail the structure, dynamics and many functions of biological macromolecules. The process of ligand diffusion inside
proteins is an example of a complex dynamical event that can be modeled using molecular dynamics simulations. The study of biomolecular interactions between a ligand and its biological
target is of paramount importance for the design of novel drugs. Because of that, identifying
the ligand access/egress pathways and understanding how ligands migrate through labyrinthine
tunnels in proteins is an area of a pivotal interest, which has spurred the development of several approaches in computational biophysics. Unfortunately, the process of ligand dissociation is
challenging to study experimentally and in the absence of time-resolved crystallography experiments on ligand intermediates, the actual ligand expulsion pathways remain to a large extent
undetermined. Moreover, the complex topology of channels in proteins leads often to difficulties in
modeling of the ligand escape pathways by classical molecular dynamics simulations, thus rendering
both experimental and computational techniques difficult to apply. We report a recently developed computational methodology involving reconstruction of reaction coordinates of the ligand
diffusion by enhanced sampling during molecular dynamics simulations [1]. Moreover, we briefly
describe machine-learning procedures that can be helpful during post-processing of the ligand diffusion paths. Namely, we report an application of a nonlinear dimensionality reduction method to
represent the high-dimensional configuration space of the ligand-protein dissociation process in a
way facilitating interpretation [2]. We illustrate the above methods on cytochrome P450cam.
J. Rydzewski would like to acknowledge financial support from The National Science Centre,
Poland (grant
2015/19/N/ST3/02171).
›
∗
To whom the correspondence should be addressed: [email protected]
[1] J. Rydzewski and W. Nowak. J. Chem. Phys. 143(12), 124101, 2015.
[2] J. Rydzewski and W. Nowak. J. Chem. Theory Comput. 12, 2110–2120, 2016.
49
Modeling of transcription factor binding sites–a machine learning approach
Karolina Smolińska,1, 2, ∗ Marcin Pacholczyk,1 and Marek Kimmel1, 3
1
Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
2
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
3
Department of Statistics, Rice Univeristy, Houston, TX, USA
Transcription factor binding sites are important for many intracellular processes. This is the
reason, why scientists create new methods of modeling and detecting TFBSs structures in DNA.
The TFBSs are traditionally modeled by Position Weight Matrices (PWMs) obtained either computationally or from experimental data. We propose a modification of Alamanova et al. [1] computational approach, implemented as 3DTF server by Gabdoulline et al. [2] The method requires
crystal structures of TF-DNA complexes. We observed that tuning of Boltzmann factor weights,
used for conversion of calculated energies to nucleotide probabilities, has significant impact on quality of resulting PWM matrix. Consequently, we created the method for PWM quality improvement
based on receiver operator characteristics (ROC) curves and 10-fold cross-validation. Selection of
the best performing PWM matrix was based on the area under the curve (AUC) parameter. We
applied the presented method to data available for members of NF-kB family: p50p50, p50p65,
p50RelB, p53 and other TF like HSF1 and Erα.
We verified effectivity of detecting TFBSs by improved 3DTF matrices on experimental data
from TRANSFAC database. To test presented technique we compared matrices constructed for
unmodified Alamanova et al. approach, original PWMs downloaded from 3DTF server, matrices
from 3DTF server improved by our method, and matrices from TRANSFAC database.
The comparison shows significant similarity and comparable performance between matrices
improved by our method and experimental matrices (TRANSFAC). The proposed approach can
be a promising alternative to experimental techniques of detecting TFBSs.
This work has been supported by Polish National Science Centre funds under grants SYMFONIA 3: UMO 2015/16/W/NZ2/00314 based at the Institute of Computer Science, Polish Academy
of Sciences, Warsaw, Poland, OPUS DEC-2012/05/B/NZ2/01618 and BKM based at the Institute
of Automatic Control, Silesian University of Technology
∗
To whom the correspondence should be addressed: [email protected]
[1] Alamanova D., Stegmaier P., Kel A. Creating PWMs of transcription factors using 3D structure-based
computation of protein-DNA free binding energies. BMC Bioinformatics, (2010) May 3;11:225
[2] Gabdoulline R., Eckweiler D., Kel A., and Stegmaier P.: 3DTF: a web server for predicting transcription
factor PWMs using 3D structure-based energy calculations. Nucl. Acids Res. (2012) 40 (W1): W180W185
50
Pattern recognition approach to rheumatic diseases study
Beata Sokolowska,1, ∗ Marta Hallay-Suszek,2 Leszek Czerwosz,1 Teresa
Sadura-Sieklucka,3 Krystyna Ksieżopolska-Orlowska,3 and Bogdan Lesyng1, 4
2
1
Mossakowski Medical Research Center, Polish Academy of Sciences, Warsaw, Poland
Interdisciplinary Center for Mathematics and Computational Modelling, University of Warsaw, Poland
3
Institute of Rheumatology, Warsaw, Poland
4
Faculty of Physics, University of Warsaw, Poland
Rheumatic diseases (RDs) are the most common diseases of civilization and their incidence is
close to 4-5% [1, 2]. Osteoarthritis (OA) and rheumatoid arthritis (RA) are two serious RDs and
they account for a significant percentage of musculoskeletal disability. OA is a chronic disease
associated with senility, and it is characterized by an irreversible damage of the joints structure.
RA is a chronic auto-inflammatory disorder, that leads to cartilage destruction, bone erosion,
and—subsequently—to joint deformities. The etiology of the RDs remains still unknown. The
clinical data was gathered using a posturographic platform of Pro-Med Posturography Computer
System [3]. In the proposed analytical approach, the algorithms of the pattern recognition were
used for differentiation between healthy subjects and rheumatic patients. In the first step the
k-NN classifier was constructed and then the misclassification rates (Er) using the leave-one-out
method were computed. The set of posturographic parameters were treated as features: (i) the
average radius of sways, (ii) the developed area and (iii) the total length of posturograms, as well
as directional components of sways, such as (iv) the length of left-right motions and (v) the length
of forward-backward motions. In addition, (vi) the biofeedback coordination estimated efficiency
of the posture self-correction. Three standard posturographic tests were applied: with eyes open
(EO) or closed (EC), and with the visual biofeedback control (BF). In summary: (i) the performed
analysis of the posturografic parameters with the proposed pattern recognition approach allow
identification of the rheumatic and healthy groups, (ii) BF test is more effective than others (Er
values of the BF test were significant smaller than EO and EC values, both before and after feature
selection).
Advantages of the posturography studies due to their non-invasiveness, simplicity, together with
the pattern recognition approach may be important, very helpful and reliable tool for clinicians
and rehabilitants for diagnosing/monitoring of patients with balance disturbances [3, 4].
The authors would like to thank Adam Jóźwik for releasing his k-NN software. The study was
supported by statutory budget of the Polish Academy of Sciences Mossakowski Medical Research
Center, computations and analysis were carried out using the computational infrastructure of the
Biocentrum-Ochota project.
∗
To whom the correspondence should be addressed: [email protected]
[1] R. Wong, A.M. Davis, E. Badley et al., Prevalence of arthritis and rheumatic diseases around the world.
A growing burden and implications for health care needs. 2010 Arthritis Community Research and
Evaluation Unit.
[2] B. Kwiatkowska, F. Raciborski, M. Maślińska et al., RAPORT: Wczesna diagnostyka chorób
reumatycznych—ocena obecnej sytuacji i rekomendacje zmian. Wyd. Instytut Reumatologii im. prof.
dr. hab. med. Eleonory Reicher, 2014, Warszawa.
[3] B. Sokolowska, L. Czerwosz, M. Hallay-Suszek et al., Posturography in patients with rheumatoid arthritis
and osteoarthritis. Adv Exp Med Biol, 2015, 2, 63–70.
[4] L. Czerwosz, E. Szczepek, B. Sokolowska et al., Posturography in differential diagnosis of normal pressure
hydrocephalus and brain atrophy. Adv Exp Med Biol, 2013, 755, 311–324.
51
Massively parallel sequencing in diagnostics of genodermatoses
Justyna Sota,1, ∗ Katarzyna Wertheim-Tysarowska,1 Dominika Śniegórska,1 Alicja
Grabarczyk,1 Tomasz Gambin,1 Anna Kutkowska-Kaźmierczak,1 Katarzyna Końska,2
Jolanta Wierzba,3 Katarzyna Woźniak,4 Cezary Kowalewski,4 and Jerzy Bal1
1
Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland
2
Genetic Counseling, University Children’s Hospital of Cracow, Poland
3
Genetic Counseling for Adults and Children,
University Clinical Center, Gdansk, Poland
4
Department of Dermatology, Medical University of Warsaw, Poland
Introduction: The genodermatoses are a large group of inherited skin disorders, often with
additional multisystem symptoms. Genetic bases are very heterogeneous and include mutations
in more than 100 genes. The next generation sequencing technologies enable massively parallel
sequencing (MPS) of all genes linked with genodermatoses.
Patients and Methods: Five patients with clinical symptoms of different types of genodermatoses were subjected to targeted MPS. Mapping and variant calling were performed with the
use of BWA (hg19) and GATK algorithms, respectively. Variant calls were annotated using VariantStudio (Illumina) and Annovar software (http://annovar.openbioinformatics.org). Annotated data were filtered according to 1) appropriate gene panel, 2) parameters of sequencing
quality, 3) frequency of identified variants in population and in-house databases, and 4) in silico
prediction of the effect of mutation on protein function. Identified point mutations were confirmed
by Sanger sequencing and, when possible, cosegregation analysis within the family was performed.
Results: Clinical diagnosis was confirmed for all five patients using bioinformatic and cosegregation analysis. The first case was a 33-year-old female with palmoplantar keratoderma and generalized blistering of skin, resembling symptoms of epidermolysis bullosa simplex (EBS). Analysis
revealed deleterious splice-site mutation in one allele of KRT1 gene (NM 006121.3: c.591+1G¿A),
in which mutations were not correlated with EBS so far. The second case was a newborn deceased
shortly after birth with clinical suspicion of epidermolysis bullosa hereditaria, in whom we identified a deleterious, de novo, point mutation in one allele of KRT5 gene (NM 000424.3: c.527A¿G
(p.Asn176Ser)) linked with autosomal dominant EBS. The third case was a newborn with clinical
symptoms of autosomal recessive harlequin ichthyosis. We detected previously unreported deletion
of single nucleotide in one allele and deleterious and point mutation in second allele of ABCA12
gene (NM 173076.2: c.6194del and c.5848C¿T (p.Asn2065ThrfsTer3 and p.Arg1950Ter)). In the
fourth case (1-year-old girl with clinical diagnosis of epidermolytic keratoderma) we found deleterious de novo point mutation in one allele of KRT10 gene (NM 000421.3: c.467G¿A (p.Arg156His)).
Finally, in an 8-year-old girl with clinical symptoms of junctional epidermolysis bullosa (JEB),
our analysis revealed two deleterious mutations in COL17A1 gene: paternal pathogenic mutation
NM 000494.3: c.1826G¿A (p.Gly609Asp) and maternal novel, stop gain mutation NM 000494.3:
c.1490 1491delinsT (p.Ala497ValfsTer23).
Conclusion: MPS is an efficient and cost-effective method in molecular diagnostics of genodermatoses, which often manifest with high phenotypic and genetic heterogeneity. In patients
with nonspecific symptoms of skin disorder or in cases where clinical diagnosis requires molecular
analysis of a large range of genes, MPS should be considered as a method of a first choice.
Supported by 2014/13/D/NZ5/03304.
∗
To whom the correspondence should be addressed: [email protected]
52
CNVs detection algorithm as a useful diagnostic tool in targeted NGS analysis
Justyna Sota,1, ∗ Tomasz Gambin,1 Katarzyna Niepokój,1 Agnieszka Charzewska,1
Anna Kutkowska-Kaźmierczak,1 Anna Jakubiuk-Tomaszuk,2 Alicja Grabarczyk,1
Katarzyna Sobecka,1 Barbara Wiśniowiecka-Kowalnik,1 Marta Kedzior,1 and Monika Gos1
1
Department of Medical Genetics, Institute of Mother and Child, Warsaw, Poland
2
Department of Pediatric Neurology and Rehabilitation,
Medical University of Bialystok, Bialystok, Poland
Background: DNA Copy-Number Variations (CNVs), next to Single Nucleotide Variations
(SNVs), are an important source of genetic variability responsible for both population diversity
and rare genetic diseases. Recent studies have shown that CNVs might be found in approximately
15% of genes that point mutations are associated with specific monogenic diseases. Moreover,
clinical symptoms of patients with recurrent and rare CNVs can overlap with phenotypes caused
by SNVs. Therefore, it should be considered to analyse SNVs together with CNVs. The progress
in the next generation sequencing (NGS) technologies and computational algorithms enables to
simultaneously identify SNVs and CNVs in the human genome.
Patients and methods: Seventy seven patients with various clinical phenotypes were subjected to targeted sequencing comprising ”clinome” - clinically relevant regions in the genome (4813
genes, TruSight One, Illumina). Mapping and variant calling were performed with the use of BWA
(hg19) and GATK algorithms, respectively. Variant calls (SNVs) were annotated using Annovar
software (http://annovar.openbioinformatics.org). Annotated data were analysed according
to phenotype-corresponding gene panel. The CNV analysis was performed with the use of computational algorithm implemented as python programs named CoNIFER (copy number inference
from exome reads, http://conifer.sourceforge.net/) against all 77 samples data. Positive
results were confirmed using array-CGH or MLPA method.
Results: Molecular confirmation of diagnosis was obtained in 39/77 (50,6%) cases. Using
CoNIFER algorithm we identified likely pathogenic CNVs in 4/39 (10,3%) patients. First case is
a 7-year-old boy with clinical symptoms of Diamond-Blackfan anemia (Asae-Smith Syndrome II,
OMIM #105650). There were no pathogenic SNVs in selected gene panel (RPS7, RPS10, RPS17,
RPS19, RPS24, RPS26, RPL5, RPL11, RPL15, RPL26, RPL35A). The CNV analysis revealed
the presence of deletion within chr1:91861459-92841875 region including RPL5 gene. Second case
is 4-year-old girl with clinical symptoms of Kleefstra syndrome (OMIM #610253). There were
no pathogenic SNVs in selected gene panel (EHMT1, MBD5, KMT2C, SMARCB1, NR1I3). The
CNV analysis has shown the deletion encompassing chr9:139874623–140728986 region in which
EHMT1 gene is localized. Third case is 6-year-old girl with Noonan Syndrome phenotype with
hypertrophic cardiomyopathy. No pathogenic SNV was identified within RASopathies genes and
the CNV analysis revealed the presence of duplication in 22q11.21 region including LZTR1 gene.
Fourth case is 19-year-old boy with a non-syndromic deafness. The SNV analysis involved ¿150
genes but no pathogenic variants were identified. The CNV analysis has shown the deletion of
STRC gene and partially CATSPER2 gene.
Conclusions: Likely pathogenic CNVs corresponding with phenotypes were identified in 4/39
(10,3%) cases with molecular confirmation of diagnosis. Therefore, CNV detection algorithm is a
valuable step of NGS data analysis and should be performed together with SNV annotation.
∗
To whom the correspondence should be addressed: [email protected]
53
Distribution analysis of L - SAARs in signal peptides across multiple
eukaryotes
Michal Stolarczyk,1, ∗ Pawel Labaj,2 and Joanna Polańska1
1
2
Silesian University of Technology, Gliwice, Poland
University of Natural Resources and Life Sciences, Vienna, Austria
It has been shown that DNA sequence repetitions are responsible for diseases called trinucleotide
repeat disorders. The most studied one is Huntington disease which is developed after the extension
of polyglutamine tracts encoded by reiterations of CAG codon over a certain length. Other known
disorders are attributable to repeats of glutamic acid (Friedrich’s ataxia) [1], arginine (Fragile X
syndrome) [2] or leucine (Myotonic dystrophy) [3]. Taking this into consideration, trinucleotide
repeats are unique structures of DNA and doubtlessly require thorough analysis. Translated into
peptide sequences trinucleotide repeats are especially important as they directly influence processes
taking place in living organisms’ cells. Such sequences can give rise to amino acid repeats (AARs).
Single amino acid repeats (SAARs) are reiterations of single amino acids within peptides. They
occur more frequently in Eukaryotes than in Prokaryotes which suggests that creation of those is
a relatively recent evolutionary process [4]. According to the COPASAAR database for proteomic
analysis and single amino acid repeats leucine is the most abundant amino acid creating SAARs
both in Prokaryotes and Eukaryotes. Consequently, even the preliminary analysis of the proteomes
insinuates the significance of those containing leucine. The vast majority of leucine reiterations
(90% of all in human) is located at the amino-terminus of proteins. In proteins that undergo
translocation, this is referred to as a signal peptide. Signal peptides have discernible three-domain
structure: an elemental domain of diversified length, a 7 - 13 residue hydrophobic domain, a
slightly polar domain. Consequently, hydrophobic single amino acid repeats are abundant in signal
peptides. However, according to the literature, leucine repeats are overrepresented even more
than others [5]. This presents a question of what the extra purpouse of the leucine repeats’ overrepresentation is in signal peptides besides their hydrophobicity.
Here, we analyze the distribution of leucine repeats found in signal peptides of orthologous
Eukaryotic proteins. Thorough analysis facilitates the detection of trends determining the direction
of evolution and prospectively the determination of the origins of L - SAARs in signal peptides. The
study was focused chiefly on mammals, the dataset consisted of the proteomes of six organisms:
Human, Chimpanzee, House mouse, Cow, Red jungle fowl and Tropical clawed frog. For the
sake of determination of the origins of L – SAARs lengths both of L – runs and signal peptides
were analyzed. This approach aids rejection of the hypothesis that AARs had arisen only due to
replication slippage mechanism as it is thought for tandem repeats in general [6]. The next part of
the study was amino acids to leucine change analysis. This stage resulted in suggestion that L –
SAARs could have originated from point muatations and that replication slippage is not the only
genome creating phenomenon. However, to confirm it profoundly there is a need to investigate the
variability and the composition of those on the evolutional path on the nucleotide level.
∗
[1]
[2]
[3]
[4]
[5]
To whom the correspondence should be addressed: [email protected]
V. Campuzano et al., Science, 1996, 271, 1423–1427.
Peprah, E., Annals of Human Genetics, 2012, 76, 178–191.
Mahadevan, M. et al., Science, 1992, 255, 1253–1255.
Depledge, D. P. et al., BMC bioinformatics, 2005, 6, 196.
Labaj, P. P. et al., FEBS Journal, 2010, 277, 3147–3157.
54
Conservation metrics overview - pros and cons
Michal Stolarczyk,1, ∗ Alicja Pluciennik,1 Tomasz Magdziarz,1 and Artur Góra1
1
Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
Protein engineering can substantially benefit from computer aided rational design methods.
Identification of amino acids essential for the intrinsic or engineered activity of the enzyme is one
of many possible preliminary steps of the enzyme functionality improvement. There are number of
approaches aiming at identification of amino acids that influence its activity, selectivity or stability.
Following the hypothesis that functionally important amino acids are relatively highly conserved,
conservation analysis of each multiple sequence alignment (MSA) position seems to be the best
suited method for this purpose [1].
As a matter of fact, conservation as an attribute of MSA can be considered variously, depending
on the research purpose. Therefore, there is an abundance of conservation metrics present in the
literature [2]. Performance of ones differs relevantly hence may facilitate diversified analyses.
Thereupon, we present a comprehensive comparison of assorted conservation metrics available
in the literature and our concept of defining the conservation score. It outperforms metrics based on
entropies and weighted according to substitution matrices in high conservation regions considering
its resolution. Simultaneously, it stays correlated with those in the rest of conservation score scope.
Such a characteristic is a consequence of employing the conservation duality and attaching the
importance to the number of amino acids on the MSA position under consideration rather than to
the fractions of ones at this position. Also, we show that coupled analysis can provide the researcher
with more valuable information on the MSA position than applying only one conservation metric
for that purpose.
∗
To whom the correspondence should be addressed: [email protected]
[1] M. Kimura, Nature, 1968, 217, 624 – 626.
[2] W.S.J. Valdar, Proteins, 2002, 48, 227–241.
55
Influence of the primary analysis on discovering differentially expressed genes
based on RNA-Seq data
Alicja Szabelska,1, ∗ Joanna Zyprych-Walczak,1 Idzi Siatkowski,1 and Michal Okoniewski2
1
Department of Mathematical and Statistical Methods Poznan university of Life Sciences
2
Scientific IT Services, ETH Zurich
RNA-Seq uses the capabilities of next-generation sequencing (NGS) technologies to measure the
presence of sequences transcribed from all the genes simultaneously. Those measurements can be
used to estimate the differential expression between the groups of biological samples or to detect
novel transcripts and isoforms of genes. There is a number of statistical and computational methods
that can tackle the analysis and management of the massive and complex datasets produced by the
sequencers. Still, those methods are often prone to be distorted by algorithmic and technological
artifacts as well as noise added by the laboratory methods. The analysis of RNA-seq data starts
with primary analysis, which is most often mapping (alignment) to the genome. Then there is a
stage of genomic feature extraction and counting and their normalization, which produces input
to the statistical tests. This study is focused on a comprehensive comparison of two different
mappers [1, 2] and five normalization methods [3? ? ? ? ] and their impact on the results
of gene expression analysis. We show that primary analysis has profound effect on the results of
the analysis. In particular we show that for many important genes, their expression levels and
differential expression can be calculated obtaining very diverse results, which is in line with the
findings of recent study [4]. In conclusions we provide suggestions on possible good practices that
can make the RNA-seq data analysis closer to the ”biological truth” that it attempts to find.
∗
To whom the correspondence should be addressed: [email protected]
[1] Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by
seed-and-vote. Nucleic Acids Research, 2013, 41(10):e108
[2] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of
transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 2013, 14:R36
[3] Leng N, Dawson J, Thomson J, et al. EBSeq: an empirical bayes hierarchical model for inference in
RNA-seq experiments. University of Wisconsin: Tech. Rep. 2012, 226
[4] Robert C and Watson M, Errors in RNA-Seq quantification affect genes of relevance to human disease.
Genome Biology, 2015, 16:177
56
Towards a simple mathematical model of hampered diffusion in biological
setting
Piotr Weber,1, ∗ Wieslaw Nowak,1 and Piotr Peplowski1
1
Institute of Physics, Faculty of Physics,
Astronomy and Informatics, Nicolaus Copernicus University, Poland
The cell membrane is a barrier that surrounds the cytoplasm of living cells and separates the
intracellular components from the external environment. It also allows a selective transport of
molecules and is able to regulate what enters and exits in the cell. The movement of molecules
across the membrane is performed by using a number of transport mechanisms. Despite of rich
knowledge about this mechanisms there are still cases that are mysterious for scientist community.
Several works have shown that times of resting of small peptides in the cell membrane can be very
long, much longer that for the simple diffusion process [1]. Also measurements of protein motion
in cell membranes frequently are compatible with subdiffusive process [2–6].
∂W (x, t)
∂W (x, t) σ 2 ∂ 2 W (x, t)
α
+ aC
+
,
0 Dt W (x, t) = V
∂t
∂x
2
∂x2
α
where C
0 Dt is a Caputo fractional derivative. This asymptotic form is determined by parameters
describing underlying stochastic motion. We also show density evolution according to fractional
differential equation for asymptotic model and obtain a solution for various model parameters.
∗
[1]
[2]
[3]
[4]
[5]
[6]
To whom the correspondence should be addressed: [email protected]
K. Kuczera, private communiction
F. Höfling, T Franosch, Reports on Progress in Physics, 2013, Vol. 4, No.4, 046602
I. Goychuk, P. Hänggi, Physica A, 2003, Vol. 325, 9-18.
I. Goychuk, P. Hänggi, Physical Review E, 2004, Vol. 70, 051915.
T.F. Nonnenmacher, D.J.F. Nonnenmacher, Physics Letters A, 1989, Vol.140, 323-326
P. Weber, P. Peplowski, Acta Physica Polonica B, 2013, 44, 1173 - 1184.
57
StructAnalyzer - a tool for sequence vs. structure similarity analysis
Jakub Wiedemann1, ∗ and Maciej Milostan2
1
Institute of Computing Science & European Centre for
Bioinformatics and Genomics, Poznan University of Technology
2
Institute of Bioorganic Chemistry, Polish Academy of Sciences,
Z. Noskowskiego 12/14, 61 704 Poznan, Poland
Comparative analysis of the structures and biological sequences may lead to determination of
characteristic structural and functional elements of biological compounds. Thorough exploration
of sequence-structure space allows to identify on one hand molecules (protein or RNA) with totally
different sequences but similar structures and on the other hand molecules with similar sequences
but different structures. The latter case is currently the most interesting from our perspective,
because it allows to identify sequences prone to significant structural change due to small number
of point mutations. Analysis of multiple sequences and structures are quite complex in terms of
computational time and complexity. Therefore, the usage of parallel processing method is indicated
in such cases and provides obtaining results in much less time. Hereby, we present StructAnalyzer,
a tool for sequence versus structure similarity analysis.
∗
To whom the correspondence should be addressed: [email protected]
58
The impact of the crossover operator on the results of evolutionary-based
algorithms in the problem of the genetic code optimization
Pawel Blażej,1 Malgorzata Wnetrzak,1, ∗ and Pawel Mackiewicz1
1
Department of Genomics, Faculty of Biotechnology,
University of Wroclaw, ul. Joliot-Curie 14a, Wroclaw, Poland
One of the most popular theories concerning the origin and evolution of the standard genetic
code is the adaptive hypothesis. It assumes that the genetic code was optimized to minimize
harmful effects of mutations and translational errors leading to amino acid replacements in the
coded proteins [1, 2]. The best way to assess the extent of this optimization is a comparison of
the standard genetic code with optimized alternatives, which can be found in the space of possible
genetic codes. However, the total number of these possible codes is extremely huge, i.e., greater
than 1.51×1084 assuming 64 nucleotide triplets encoding 20 amino acids and three stop translation
signals. Therefore, in our searches, we used Evolutionary Algorithms approach, as did the authors
of the previous works related to this topic [3]. However, their simulations were based only on
mutation operators. Since it is well known that EA are founded not only on mutation but also on
crossover operators, we proposed crossover operators suitable for considered models of the genetic
code and presented possible advantages of using them together with mutation to solve the problem
of the genetic code optimization. We compared algorithms with various mutation and crossover
probabilities under two different models of the genetic code. The results indicate that the usage
of the crossover operator can significantly improve the quality of solutions received in the singleobjective optimization case. Our results demonstrate that the standard genetic code is only locally
optimized because it is possible to find alternatives that are at least two times better optimized.
∗
To whom the correspondence should be addressed: [email protected]
[1] S. J. Freeland, T. Wu, N. Keulmann, Orig Life Evol Biosph, 2003, 33(4-5): 457-477
[2] R.D. Knight, S.J. Freeland, L.F. Landweber. Trends Biochem. Sci., 1999, 24, 241-247
[3] J. Santos, A. Monteagudo, BMC Bioinformatics, 2011, 12, 1-8
59
Correlated mutations select misfolded from properly folded proteins
Pawel P. Woźniak,1, ∗ Malgorzata Kotulska,1 and Gert Vriend2
1
Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wroclaw, Poland
2
Centre for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, the Netherlands
Knowledge about the three dimensional structure of proteins is a prerequisite for studies on
their behavior, stability, or their role as target in drug design. The traditional methods for structure determination are either experimental, such as X-ray crystallography and NMR, or in silico
like homology modeling. Experimental methods are costly and time-consuming while homology
modeling is dependent on the availability of a homologous protein structure. Therefore, computational methods that allow for protein structure reconstruction from sequence only have long
been looked for. One of these is the recently developed direct coupling analysis (DCA) method
[1, 2]. DCA method achieves the best results in residue-residue contact prediction from multiple
sequence alignments only. Predicted contacts are used as restraints in the reconstruction of the
three-dimensional structure of a protein. Unfortunately, the accuracy of present day DCA methods
is on the order of 40% among the 100 strongest predicted contacts. This is insufficient for ab initio protein structure reconstruction. The results of DCA can, however, support protein structure
reconstruction in several ways.
We showed that DCA algorithm is able to indicate a better structure among properly folded
and misfolded variants by the prediction of residue-residue contacts for these proteins. We counted
the number of correctly predicted contacts among the strongest 100 predictions made by DCA
for a set of obsolete PDB files and their successors and for 22 proteins for which the Decoys ’R’
Us database [3] provided properly folded and misfolded structures. These counts were related to
structure similarity scores, such as RMSD or TM-score [4]. DCA predicts properly significantly
more contacts for properly folded structures than for misfolded ones. Our method works much
better for structures determined with X-ray crystallography than with the NMR spectroscopy. We
discuss the most interesting cases. The method will not detect misfolded proteins per se, but when
a protein structure experimentalist needs to choose between alternative folds for the same protein,
DCA seems a useful aid.
∗
To whom the correspondence should be addressed: [email protected]
[1] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D.S. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T.
Hwa, M. Weigt, Direct-coupling analysis of residue coevolution captures native contacts across many
protein families, 2011, Proc Natl Acad Sci U S A 108(49):E1293-301.
[2] C. Feinauer, M.J. Skwark, A. Pagnani, E. Aurell, Improving contact prediction along three dimensions,
2014, PLoS Comput Biol., 10(10):e1003847.
[3] R. Samudrala, M. Levitt, Decoys ’R’ Us: A database of incorrect protein conformations to improve
protein structure prediction, 2000, Protein Science 9: 1399-1401.
[4] Y. Zhang, J. Skolnick, TM-align: A protein structure alignment algorithm based on TM-score, 2005,
Nucleic Acids Research, 33: 2302-2309.
60
Mathematical modelling of immune response
Joanna Ziobro,1, ∗ Pawel Blażej,1 and Pawel Mackiewicz1
1
Department of Genomics, Faculty of Biotechnology,
University of Wroclaw, ul. Joliot-Curie 14a, Wroclaw, Poland
The dynamic development of medicine and inventing new treatments requires modifying the
current and creating new models describing immunological reactions of human organism. The
immune response is a complex set of defensive reactions which includes the antigen recognition,
its neutralization and elimination. There are two mechanisms of immune response which are
interdependent: cellular and humoral response.
We analyzed two models which describe the humoral type of human immune response. The first
model describes the reaction between antibody and antigen. The second Marchuk’s model is more
complicated because it takes into account the delay of proliferation of lymphocytes in respect of the
antigen presentation [1]. There are two stationary states in each model. One describes the healthy
state of an organism and the second characterizes a chronic disease. We analyzed the stability of
the stationary states of these models. Our studies show that the first model does not describe all
main processes during the immune response and even a very strong immune system is not always
able to deal with the large dose of antigen. The stability of the first state in Marchuk’s model does
not depend on the delay. However, the delay can be important in the case of chronic state [2].
We presented a method analyzing the system of delay differential equations [3]. This type of
differential equations describes well the complex biological processes such as responses to infection
that often occurs with some delay. The system of delay differential equations has several features
complicating the analysis more than in case of the systems of ordinary differential equations. We
studied the stability of the chronic state depending on the time of delay.
∗
To whom the correspondence should be addressed: [email protected]
[1] G.I. Marchuk, R.V. Petrov, A.A. Romanyuakha, G.A. Bocharov, J. theor. Biol., 1991, 151, 1-69
[2] M. Bodnar, U. Foryś, Internat. J. Appl. Math. Comput. Sci., 2000, 10 (I), 101-116
[3] F.M. Asl, A.G. Ulsoy, J. Dyn. Syst. Meas. Cont., 2003, 125 (2), 215-223
61
A new method to evaluate quality of RNA 3D models
Tomasz Zok,1, ∗ Maciej Antczak,1 Piotr Lukasiak,1, 2 and Marta Szachniuk1, 2
1
Institute of Computing Science, Poznan University of Technology
2
Institute of Bioorganic Chemistry, Polish Academy of Sciences
The area of computational modelling of RNA 3D structures is constantly gaining more attention.
With the advent of new approaches and constant refinement of existing ones the difficulty that can
be tackled by computational means is rising. At the same time, the difficulty of another challenge
is also on the rise. It concerns the problem of quality evaluation, especially when one lacks access
to homologue structures.
We propose a new method which addresses the problem of quality evaluation in two ways. Both
of them rely on RNApdbee [1] which is required to unify and aggregate information about RNA
3D structure from various external tools. In the first mode, the method is able to score each of
many models on a scale from 0 to 1 and effectively construct a ranking. This is obtained thanks
to a voting mechanism in which every model provides its input on the correctness or wrongness
of base-base interactions. In total, a consensus from all inputs is constructed and treated as a
virtual target structure to which all models are ranked by means of Interaction Network Fidelity
(INF). The same virtual structure is used in the second variant of the method. Here however, each
base-base interaction on its own is under scrutiny of being right or wrong.
As both variants rely on the constructed virtual target structure, it is of great importance to
define precisely how the consensus is obtained. We conducted a series of computational experiments
to refine the consensus threshold. Next, the obtained rankings and base-base interaction scores
were compared with our benchmark made out of a few past challenges of RNA-Puzzles. We found
out that recreation of RNA-Puzzles ranking and prediction of INF score are indeed two different
aims and they both require different consensus threshold value. With these known, we participated
in the newly created Quality Prediction category of RNA-Puzzles contest.
∗
To whom the correspondence should be addressed: [email protected]
[1] M. Antczak, T. Zok, M. Popenda, P. Lukasiak, R.W. Adamiak, J. Blazewicz & M. Szachniuk. RNApdbee
– a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. NAR, 2014,
42(W1), W368-W372
62
Study on correlated mutations in plant proteinase inhibitors - Bowman-Birk
and Kunitz family
Agata Żyźniewska1, ∗ and Jacek Leluk1
1
University of Zielona Góra, Faculty of Biological Sciences,
Department of Biological Sciences, Zielona Góra, Poland
Plant proteinase inhibitors (PIs) are small proteins, generally present at high concentration in
storage tissues, but also detectable in leaves in response to the attack of insects and pathogenic
microorganisms. PIs’ contribution to plant defense mechanisms relies on inhibition of proteinases
present in insects’ guts or produced by microorganisms, causing a reduction in the availability of
amino acids necessary for their growth and development [1]. In plants, there are many different
types of the proteinase inhibitors. They belong to a group of anti-nutritional factors. The organisms
exposed to their action may act negatively (eg. trypsin chelation) or positively (eg. antioxidant
properties).
Bowman-Birk inhibitors (BBI) are small serine proteinase inhibitors found in the leguminous
and gramineous plants. Characteristically, their molecular masses are about of 7 kDa (about 70
amino acid residues) and they are rich in disulfide bonds. The Bowman-Birk inhibitors are also
recognized as potential cancer chemopreventive agents [2, 3]. Human organisms consuming large
amounts of BBI in their diet have been demonstrated to exhibit lower rates of colon, breast, prostate
and skin cancers. In our study of this group of inhibitors there were identified and described 6
clusters of correlated mutations in functionally significant regions.
Plant Kunitz-type inhibitors are present in leguminous seeds. The first discovered inhibitor
from this family (SBTI) was obtained from Glycine max seeds and, over the past three decades, a
large number of other inhibitors have been found. The data concerning the primary and tertiary
structure of plant Kunitz-type inhibitors are helpful to understand their mechanisms of action as
the coagulation factors, inflammation and tumors, and to allow to investigate which region of the
protein is responsible for its biological activity [4, 5]. In our study of this protein group there were
described 11 clusters of correlated mutations.
The results of our study were obtained with the aid of new original software designed for phylogenetic studies, mutational variability within homologous proteins, and identification of correlated
mutations.
∗
To whom the correspondence should be addressed: [email protected]
[1] Ryan C.A. (1990), Protease inhibitors in plants: genes for improving defenses against insects and
pathogens, Annu. Rev. of Phytopath., 28, 425-449
[2] Lippman S.M., Matrisian L.M. (2000), Protease Inhibitors in Oral Carcinogenesis and Chemoprevention,
Clinical Cancer Research, Vol. 6, 4599-4603
[3] Jaulent A.M., Leatherbarrow R.J. (2004) Design, synthesis and analysis of novel bicyclic and bifunctional
protease inhibitors, Protein Engineering, Design and Selection, Vol. 17 no. 9, pp. 681-687
[4] Oliva M. L. V., Sampaio M. U. (2009) Action of plant proteinase inhibitors an enzymes of physiopathological importance, Annals of The Brazilian Academy of Science, 81(3), 615-621
[5] Major I. T., Constabel C. P. (2008) Functional Analysis of the Kunitz Trypsin Inhibitor Family in
Poplar Reveals Biochemical Diversity and Multiplicity in Defense against Herbivores, Plant Physiology,
Vol. 146, pp. 888-903
63