Two recent applications of graph theory in molecular biology

25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Two recent applications of graph theory in
molecular biology
Debra J. Knisley
25th Clemson Mini-Conference on Discrete Math and Algorithms
October 7, 2010
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
1 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Outline
New applications in molecular biology
Graphs as Secondary RNA structures
Graphs as Amino Acids
Graphs as Proteins
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
2 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
DNA to RNA to Protein
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
3 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
Amino Acid
3-letter
1-letter
Polarity
Alanine
Ala
Arg
Asp
Cys
Glu
Gln
His
Ile
Leu
Lys
A
R
N
C
E
Q
H
I
L
K
nonpolar
Arginine
Asparagine
Cysteine
Glutamic acid
Glycine
Histidine
Isoleucine
Leucine
Lysine
Debra J. Knisley
October 7, 2010
t
polar
polar
polar
nonpolar
polar
polar
nonpolar
polar
nonpolar
Two recent applications of graph theory in molecular biology
4 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Human Genome Project
How many genes are in the Human Genome?
Majority of biologists believed a lot more than
50,000
Rice genome contains about 50,000 genes
Big surprise, humans only have about half that
many!
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
5 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Major changes in the science of molecular
biology
The number of genes in the genome was just the
first of many surprises!
The belief that the sole function of RNA is to carry
the ”code” of instructions for a protein is not true
It is now known that more than half of the RNA
molecules that have been discovered are
”non-coding”
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
6 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Systems Biology
The two surprising discoveries are related–
It is not how many genes, it is how they are
regulated that is important!
Now everyone is trying to understand regulatory
networks - Systems Biology
The study of RNA molecules is at the forefront of
the field of Systems Biology
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
7 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Database of RNA Tree Graphs
Home
RNA Tree Graph Subproject Home
How to Represent RNA Trees as Planar Graphs
List of RNA Tree Graphs by Number of Vertices
2 Vertices
3 Vertices
4 Vertices
5 Vertices
6 Vertices
7 Vertices
8 Vertices
9 Vertices
10 Vertices
List of RNA Tree Graphs by Functional RNA Class
P5abc
RNA in Signal Recognition Complex
Single Strand RNA
tRNA
U2 snRNA
5S rRNA
70S
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
8 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Secondary RNA structure: Tamar Schlick
et.al, web resource RAG (RnaAsGraphs)
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
9 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Vertices represent internal loops
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
10 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Tree model of secondary RNA structure
Secondary RNA structure
And corresponding tree.
See http://www.monod.nyu.edu
(Schlick et al.)
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
11 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
They can be large, but most folding
algorithms predict smaller tree structures
The tree representation
of an RNA molecule’s
secondary structure.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
12 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
The RAG Database
The six trees of order 6 with their RAG (n.z) index and
color classification
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
13 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Modeling RNA Bonding
When two trees T1 and T2 containing n and m vertices
respectively are merged, the resulting tree T1+2 has a
total of n + m − 1 vertices.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
14 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Modeling RNA Bonding
To accurately model RNA molecule bonding, we must
consider all possible vertex identifications between two
RNA tree models.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
15 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Modeling RNA Bonding
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
16 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Modeling RNA Bonding
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
17 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Modeling RNA Bonding
For all 94 graphs on 2 through 9 vertices, we
recorded every possible vertex identification
resulting in a graph on 9 or less vertices
The data from each vertex identification was
recorded and translated into a data vector
[hc1 , c2 , deg(v1 ), deg(v2 )i , hy1 , y2 i]
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
18 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Observations
Examples of Data Vectors from Red Trees and Black
Trees
Red Trees 8.5 and 8.15
Data Vectors
[< 1, 1, 1, 1 >, < 1, 0 >]
[< 1, 1, 1, 2 >, < 1, 0 >]
[< 1, 1, 2, 1 >, < 1, 0 >]
[< 1, 1, 2, 2 >, < 1, 0 >]
Debra J. Knisley
October 7, 2010
Black Trees 8.22 and 8.23
Data Vectors
[< 1, 1, 1, 5 >, < 0, 1 >]
[< 1, 1, 2, 4 >, < 0, 1 >]
[< 1, 1, 3, 4 >, < 0, 1 >]
[< 1, 0, 1, 6 >, < 0, 1 >]
Two recent applications of graph theory in molecular biology
19 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Artificial Neural Network
Each tree’s final classification was calculated as an
average of a linear combination of prediction values from
the vertex identifications
Code
a[8.5] := 2 ∗ Classify(h1, 1, 1, 2i);
b[8.5] := 4 ∗ Classify(h1, 1, 1, 1i);
c[8.5] := 4 ∗ Classify(h1, 1, 1, 1i);
d[8.5] := 4 ∗ Classify(h1, 1, 2, 1i);
Class[8.5] :=
Debra J. Knisley
October 7, 2010
a[8.5]+b[8.5]+c[8.5]+d[8.5]
14
Prediction
h
1.99966, 0.00135i
−7
4.00000, 2.6820110−7 4.00000, 2.6820110
h3.99932, 0.00270i
h0.99991, 0.00034i
Two recent applications of graph theory in molecular biology
20 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Results: Values for the Classified RAG Trees
Debra J. Knisley
RAG
Index
Color
Class
ANN
Prediction
ANN
Result
7.1
7.10
7.11
7.2
7.3
7.9
8.10
8.15
8.17
8.18
9.6
9.11
9.13
9.27
Red
Black
Black
Red
Red
Black
Red
Red
Black
Black
Red
Red
Red
Red
1.00000
0.59091
0.00045
0.99860
0.99990
0.43130
0.99994
0.56343
0.36524
0.00595
0.99991
0.99993
0.99740
0.99795
Highly RNA-Like
Unclassifiable
Highly Not-RNA-Like
Highly RNA-Like
Highly RNA-Like
Unclassifiable
Highly RNA-Like
Unclassifiable
Not-RNA-Like
Highly Not-RNA-Like
Highly RNA-Like
Highly RNA-Like
Highly RNA-Like
Highly RNA-Like
October 7, 2010
Two recent applications of graph theory in molecular biology
21 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
t
22 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Domination number = 4
Domination number = 3
Domination number = 2
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
23 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Locating-dominating number
1
Vertex
9
2
5
6
1
9
2
7
3
4 & 10
5
4
7
8
4
Dominating Vertex
10
3
6
7 & 10
8
4, 7 & 9
Combinatorial optimization
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
24 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
locating domination number = 4
locating domination number = 5
locating domination number = 5
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
25 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Graphical measures
We used two measures defined by domination
invariants
P1 =
γ + γt + γa
,
n
P2 =
γL + γD
n
where γ is the domination number, γt is the total
domination number, γa is the global allied
domination number, γL is the locating domination
number, γD is the differentiating domination
number, and n is the number of vertices in the
graph.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
26 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Results
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
27 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Combinatorial Molecular Biology?
In combinatorics, we find optimal structural
properties by defining graphical invariants based on
minimal/maximal achievements such as the
domination number, chromatic number, etc.
Can we utilize the graphical invariants from graph
theory as descriptors of biomolecules? That is, do
biomolecules exhibit optimal structural properties?
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
28 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Computational Chemistry- Chemical Graph
Theory
Topological indices together with quantities derived
from chemical properties are called molecular
descriptors
There are over 3,000 molecular descriptors in the
Handbook of Molecular Descriptors. Of those,
approximately 1600 can be classified as topological
indices.
A tool of QSAR, the process by which chemical
structure is quantitatively correlated with a well
defined process, such as biological activity or
chemical reactivity.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
29 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
The dilemma
It is generally believed that topological indices are
not valid for large graphs
Unlike RNA, proteins are very large molecules
Graphical invariants are suited for large graphs, but
are often intractable
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
30 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Determining the function of a protein
When a new protein is discovered, we may not
know its function
There are two ways to address the question of
function
Compare its primary sequence (sequence of amino
acids) to all other known proteins
Predict (or determine) its 3D conformation and
compare its conformation to all other known 3D
conformations
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
31 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Hamming distance
R-T-S-C-A-G-G-W-A-C
R-T-S-C-Y-Y-W-G-A-C
target sequence
second sequence
R-T-S-C-A-G-G-W-A-C
target sequence
K-T-S-C-G-A-G-Y-A-C
third sequence
What is the Hamming distance in each case.
Are these “equally close”
No, some amino acids are more likely to substitute
for each other than other amino acids. There exist
Scoring Matrices (often based on the likelyhood of
one amino acid being replaced by another)
See: AAindex database
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
32 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Graphs for amino acids
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
33 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Graphs for amino acids
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
34 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Table of graph invariants from
vertex-weighted graphs
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Debra J. Knisley
October 7, 2010
acid
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
G
12
54
42
44
12
42
44
0
40
24
36
38
44
36
24
12
12
62
52
12
g
12
36
24
24
12
24
24
0
36
24
24
24
24
24
12
12
12
48
24
12
d
2
9
5
5
3
6
6
0
6
5
5
8
6
7
4
3
3
9
8
3
c
0
0
0
0
0
0
0
0
5
0
0
0
0
6
4
0
0
9
6
0
m
1.000
1.750
1.600
1.600
1.333
1.667
1.667
0.000
2.000
1.600
1.600
1.667
1.600
2.000
2.000
1.333
1.500
2.182
2.000
1.500
p
12
14
15
16
32
15
16
0
14
12
12
14
12
12
12
16
14
12
16
12
Two recent applications of graph theory in molecular biology
35 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Selecting the invariants
Dendrogram using 5 invariants
Ward Linkage, Euclidean Distance
Distance
14.81
9.87
4.94
0.00
A
T
V
S
P
G
C
R
N
D
Q
E
I
L
M
K
H
F
Y
W
Amino acids
Dendrogram based on invariants g_1, d, c and p
Ward Linkage, Euclidean Distance
Distance
We see that the measure
m - average degree
changed the relationship of R
and K quite a bit.
10.88
7.25
3.63
0.00
A
Debra J. Knisley
October 7, 2010
T
V
S
P
G
C
R
K N D Q
Amino acids
E
I
L
M
H
F
Two recent applications of graph theory in molecular biology
Y
W
36 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Selecting the invariants
Four graphical invariants that capture the basic
properties of the amino acids
Dendrogrambased on invariants g_1, d, c and p
Ward Linkage, Euclidean Distance
Distance
10.88
7.25
3.63
0.00
A T V S P G C R K N D Q E I L M H F Y W
Amino acids
Possible add-on tool for scoring multiple sequence alignments
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
37 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
What is reverse engineering?
Building a mathematical model from observations
The model is a representation of the biomolecule,
network or whatever system under observation
The model is adjusted to fit the observations
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
38 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
The DREAM CHALLENGE
What is DREAM?
Discussions for Reverse Engineering Assessment
and Methods
“The fundamental question for DREAM is simple:
How can researchers assess how well they are
describing the biological systems?”
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
39 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Sponsors of DREAM
Columbian University Center for Multiscale Analysis
Genomic and Cellular Networks (MAGNet)
NIH Roadmap Initiative
IBM Computational Biology Center
The New York Academy of Sciences
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
40 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
A Challenge is posted each year: An
opportunity for “assessment”
DREAM4(2009)- Organized by Gustavo Stolovitzky,
IBM Computational Biology Center and Andrea
Califano, Columbia University
The PDZ challenge – Given the primary sequence of
5 proteins, almost identical, can you predict the
binding target of each one. That is, what is the
binding affinity difference of each ”mutant”.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
41 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
The Challenge
There is a binding pocket in each PDZ domain
Mutations in the binding pocket result in a change
in the binding affinity
Given a mutation, can you predict the change in
binding affinity?
A single amino acid change can have ”synergistic”
consequences
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
42 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Higlighted amino acids are attracted to
different amino acids in the target protein
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
43 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Our Approach
Debra J. Knisley
1
Use the graph-theoretic models and corresponding
graphical invariants of individual amino acids
2
Replace each alphabet string representing the
primary structure of each domain (and
corresponding target) with a numerical string. The
numerical string was determined by the observation
of the properties of the amino acids.
3
Train a neural network using these numerical strings
as inputs.
October 7, 2010
Two recent applications of graph theory in molecular biology
44 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Conclusion
“The idea of optimization is intimately connected with
modern science. Pioneers like Galileo, Fermat and
Newton, were convinced that the world had been created
by a benevolent God who had established the laws of
nature as the most efficient way to achieve his purposes:
in short, this is the best of all possible worlds and it is
the task of science to find out why and how. Gradually
this view was overturned, leaving optimization as an
important tool for the human-engineered world.”
Taken from the abstract of Ivar Ekeland, University of
British Columbia, for his upcoming Math Matters
Lecture on The Best of All Possible Worlds: The Idea of
Optimization.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
45 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
Joint work with a number of people at
ETSU
Jeff Knisley- applied mathematics
Teresa Haynes- graph theory
Edith Seier-statistics
Denise Koessler- Mathematics graduate student
Cade Herron and Jordon Shipley- Mathematics
undergraduate students
– Acknowledgement: Tamar Schlick and the web
resource folks at NYU
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
46 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
References
G. Benedetti and S. Morosetti, A Graph-Topological Approach to
Recognition of Pattern and Similarity in RNA Secondary
Structures, Biol Chem. 22 (1996) 179-184.
N. Bose and P. Liang, Neural Network Fundamentals with Graphs,
Algorithms, and Applications, New York: McGraw-Hill (1996).
G. Chartrand, Introductory Graph Theory, Boston: Prindle, Weber
& Schmitt (1977).
H. Gan, S. Pasquali, and T. Schlick, Exploring the Repertoire of
RNA Secondary Motifs Using Graph Theory: Implications for RNA
Design, Nucleic Acids Research, 31 (2003) 2926-2943.
F. Harary and G. Prings, The Number of Homeomorphically
Irreducible Trees and Other Species, Acta Math, 101 (1959)
141-162.
T. Haynes, S. Hedetniemi, and P. Slater, Fundamentals of
Domination in Graphs, New York: Marcel Dekker, Inc. (1998).
D. Knisley, T. Haynes T, E. Seier, and Y. Zou, A Quantitative
Analysis of Secondary RNA Structure Using Domination Based
Parameters on Trees, BMC Bioinformatics 7 (2006) 108.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
47 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
References
D. Knisley D, T. Haynes T, and J. Knisley, Using a Neural Network
to Identify Secondary RNA Structures Quantified by Graphical
Invariants, Communications in Mathematical and in Computer
Chemistry / MATCH, 60 (2008) 277-290.
S. Le, R. Nussinov, and J. Maziel, Tree Graphs of RNA Secondary
Structures and their Comparison, Comp. Biomed. Res., 22 (1989)
461-473.
M. Nelson and S. Istrail, RNA Structure and Prediction
Computational Molecular Biology,
[http://tuvalu.santafe.edu/ pth/rna.html], (1996).
T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson,
and H. Gan, RAG: RNA-As-Graphs Web Resource, BMC
Bioinformatics, 5 (2004) 88.
T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson,
and H. Gan, RAG: RNA as Graphs Web Resource,
[http://monod.biomath.nyu.edu/rna/rna.php], (2007).
T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson,
H. Gan, and M. Tang, RAG: RNA-As-Graphs Database-Concepts,
Analysis, and Features, Bioinformatics, 20 (2004) 1285-1291.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
48 / 50
25th Clemson Mini-Conference on
Discrete Math and Algorithms
t
References
T. Schlick, N. Kim, N. Shiffeldrim, and H. Gan, Candidates for
Novel RNA Topologies, J Mol Biol, 341 (2004) 1129-1144.
J. Shao, Linear model selection by cross-validation, J. Am.
Statistical Association, 88 (1993) 486-494.
S. Vishveshwara, K. V. Brinda, and N. Kannan, Protein Structure:
Insights from Graph Theory, Journal of Theoretical and
Computational Chemistry, 1 (2002) 187-211.
Debra J. Knisley
October 7, 2010
Two recent applications of graph theory in molecular biology
49 / 50