25th Clemson Mini-Conference on Discrete Math and Algorithms t Two recent applications of graph theory in molecular biology Debra J. Knisley 25th Clemson Mini-Conference on Discrete Math and Algorithms October 7, 2010 Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 1 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Outline New applications in molecular biology Graphs as Secondary RNA structures Graphs as Amino Acids Graphs as Proteins Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 2 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t DNA to RNA to Protein Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 3 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms Amino Acid 3-letter 1-letter Polarity Alanine Ala Arg Asp Cys Glu Gln His Ile Leu Lys A R N C E Q H I L K nonpolar Arginine Asparagine Cysteine Glutamic acid Glycine Histidine Isoleucine Leucine Lysine Debra J. Knisley October 7, 2010 t polar polar polar nonpolar polar polar nonpolar polar nonpolar Two recent applications of graph theory in molecular biology 4 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Human Genome Project How many genes are in the Human Genome? Majority of biologists believed a lot more than 50,000 Rice genome contains about 50,000 genes Big surprise, humans only have about half that many! Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 5 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Major changes in the science of molecular biology The number of genes in the genome was just the first of many surprises! The belief that the sole function of RNA is to carry the ”code” of instructions for a protein is not true It is now known that more than half of the RNA molecules that have been discovered are ”non-coding” Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 6 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Systems Biology The two surprising discoveries are related– It is not how many genes, it is how they are regulated that is important! Now everyone is trying to understand regulatory networks - Systems Biology The study of RNA molecules is at the forefront of the field of Systems Biology Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 7 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Database of RNA Tree Graphs Home RNA Tree Graph Subproject Home How to Represent RNA Trees as Planar Graphs List of RNA Tree Graphs by Number of Vertices 2 Vertices 3 Vertices 4 Vertices 5 Vertices 6 Vertices 7 Vertices 8 Vertices 9 Vertices 10 Vertices List of RNA Tree Graphs by Functional RNA Class P5abc RNA in Signal Recognition Complex Single Strand RNA tRNA U2 snRNA 5S rRNA 70S Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 8 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Secondary RNA structure: Tamar Schlick et.al, web resource RAG (RnaAsGraphs) Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 9 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Vertices represent internal loops Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 10 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Tree model of secondary RNA structure Secondary RNA structure And corresponding tree. See http://www.monod.nyu.edu (Schlick et al.) Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 11 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t They can be large, but most folding algorithms predict smaller tree structures The tree representation of an RNA molecule’s secondary structure. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 12 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t The RAG Database The six trees of order 6 with their RAG (n.z) index and color classification Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 13 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Modeling RNA Bonding When two trees T1 and T2 containing n and m vertices respectively are merged, the resulting tree T1+2 has a total of n + m − 1 vertices. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 14 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Modeling RNA Bonding To accurately model RNA molecule bonding, we must consider all possible vertex identifications between two RNA tree models. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 15 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Modeling RNA Bonding Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 16 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Modeling RNA Bonding Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 17 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Modeling RNA Bonding For all 94 graphs on 2 through 9 vertices, we recorded every possible vertex identification resulting in a graph on 9 or less vertices The data from each vertex identification was recorded and translated into a data vector [hc1 , c2 , deg(v1 ), deg(v2 )i , hy1 , y2 i] Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 18 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Observations Examples of Data Vectors from Red Trees and Black Trees Red Trees 8.5 and 8.15 Data Vectors [< 1, 1, 1, 1 >, < 1, 0 >] [< 1, 1, 1, 2 >, < 1, 0 >] [< 1, 1, 2, 1 >, < 1, 0 >] [< 1, 1, 2, 2 >, < 1, 0 >] Debra J. Knisley October 7, 2010 Black Trees 8.22 and 8.23 Data Vectors [< 1, 1, 1, 5 >, < 0, 1 >] [< 1, 1, 2, 4 >, < 0, 1 >] [< 1, 1, 3, 4 >, < 0, 1 >] [< 1, 0, 1, 6 >, < 0, 1 >] Two recent applications of graph theory in molecular biology 19 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Artificial Neural Network Each tree’s final classification was calculated as an average of a linear combination of prediction values from the vertex identifications Code a[8.5] := 2 ∗ Classify(h1, 1, 1, 2i); b[8.5] := 4 ∗ Classify(h1, 1, 1, 1i); c[8.5] := 4 ∗ Classify(h1, 1, 1, 1i); d[8.5] := 4 ∗ Classify(h1, 1, 2, 1i); Class[8.5] := Debra J. Knisley October 7, 2010 a[8.5]+b[8.5]+c[8.5]+d[8.5] 14 Prediction h 1.99966, 0.00135i −7 4.00000, 2.6820110−7 4.00000, 2.6820110 h3.99932, 0.00270i h0.99991, 0.00034i Two recent applications of graph theory in molecular biology 20 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Results: Values for the Classified RAG Trees Debra J. Knisley RAG Index Color Class ANN Prediction ANN Result 7.1 7.10 7.11 7.2 7.3 7.9 8.10 8.15 8.17 8.18 9.6 9.11 9.13 9.27 Red Black Black Red Red Black Red Red Black Black Red Red Red Red 1.00000 0.59091 0.00045 0.99860 0.99990 0.43130 0.99994 0.56343 0.36524 0.00595 0.99991 0.99993 0.99740 0.99795 Highly RNA-Like Unclassifiable Highly Not-RNA-Like Highly RNA-Like Highly RNA-Like Unclassifiable Highly RNA-Like Unclassifiable Not-RNA-Like Highly Not-RNA-Like Highly RNA-Like Highly RNA-Like Highly RNA-Like Highly RNA-Like October 7, 2010 Two recent applications of graph theory in molecular biology 21 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology t 22 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Domination number = 4 Domination number = 3 Domination number = 2 Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 23 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Locating-dominating number 1 Vertex 9 2 5 6 1 9 2 7 3 4 & 10 5 4 7 8 4 Dominating Vertex 10 3 6 7 & 10 8 4, 7 & 9 Combinatorial optimization Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 24 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t locating domination number = 4 locating domination number = 5 locating domination number = 5 Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 25 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Graphical measures We used two measures defined by domination invariants P1 = γ + γt + γa , n P2 = γL + γD n where γ is the domination number, γt is the total domination number, γa is the global allied domination number, γL is the locating domination number, γD is the differentiating domination number, and n is the number of vertices in the graph. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 26 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Results Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 27 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Combinatorial Molecular Biology? In combinatorics, we find optimal structural properties by defining graphical invariants based on minimal/maximal achievements such as the domination number, chromatic number, etc. Can we utilize the graphical invariants from graph theory as descriptors of biomolecules? That is, do biomolecules exhibit optimal structural properties? Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 28 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Computational Chemistry- Chemical Graph Theory Topological indices together with quantities derived from chemical properties are called molecular descriptors There are over 3,000 molecular descriptors in the Handbook of Molecular Descriptors. Of those, approximately 1600 can be classified as topological indices. A tool of QSAR, the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 29 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t The dilemma It is generally believed that topological indices are not valid for large graphs Unlike RNA, proteins are very large molecules Graphical invariants are suited for large graphs, but are often intractable Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 30 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Determining the function of a protein When a new protein is discovered, we may not know its function There are two ways to address the question of function Compare its primary sequence (sequence of amino acids) to all other known proteins Predict (or determine) its 3D conformation and compare its conformation to all other known 3D conformations Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 31 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Hamming distance R-T-S-C-A-G-G-W-A-C R-T-S-C-Y-Y-W-G-A-C target sequence second sequence R-T-S-C-A-G-G-W-A-C target sequence K-T-S-C-G-A-G-Y-A-C third sequence What is the Hamming distance in each case. Are these “equally close” No, some amino acids are more likely to substitute for each other than other amino acids. There exist Scoring Matrices (often based on the likelyhood of one amino acid being replaced by another) See: AAindex database Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 32 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Graphs for amino acids Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 33 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Graphs for amino acids Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 34 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Table of graph invariants from vertex-weighted graphs Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Debra J. Knisley October 7, 2010 acid A R N D C Q E G H I L K M F P S T W Y V G 12 54 42 44 12 42 44 0 40 24 36 38 44 36 24 12 12 62 52 12 g 12 36 24 24 12 24 24 0 36 24 24 24 24 24 12 12 12 48 24 12 d 2 9 5 5 3 6 6 0 6 5 5 8 6 7 4 3 3 9 8 3 c 0 0 0 0 0 0 0 0 5 0 0 0 0 6 4 0 0 9 6 0 m 1.000 1.750 1.600 1.600 1.333 1.667 1.667 0.000 2.000 1.600 1.600 1.667 1.600 2.000 2.000 1.333 1.500 2.182 2.000 1.500 p 12 14 15 16 32 15 16 0 14 12 12 14 12 12 12 16 14 12 16 12 Two recent applications of graph theory in molecular biology 35 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Selecting the invariants Dendrogram using 5 invariants Ward Linkage, Euclidean Distance Distance 14.81 9.87 4.94 0.00 A T V S P G C R N D Q E I L M K H F Y W Amino acids Dendrogram based on invariants g_1, d, c and p Ward Linkage, Euclidean Distance Distance We see that the measure m - average degree changed the relationship of R and K quite a bit. 10.88 7.25 3.63 0.00 A Debra J. Knisley October 7, 2010 T V S P G C R K N D Q Amino acids E I L M H F Two recent applications of graph theory in molecular biology Y W 36 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Selecting the invariants Four graphical invariants that capture the basic properties of the amino acids Dendrogrambased on invariants g_1, d, c and p Ward Linkage, Euclidean Distance Distance 10.88 7.25 3.63 0.00 A T V S P G C R K N D Q E I L M H F Y W Amino acids Possible add-on tool for scoring multiple sequence alignments Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 37 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t What is reverse engineering? Building a mathematical model from observations The model is a representation of the biomolecule, network or whatever system under observation The model is adjusted to fit the observations Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 38 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t The DREAM CHALLENGE What is DREAM? Discussions for Reverse Engineering Assessment and Methods “The fundamental question for DREAM is simple: How can researchers assess how well they are describing the biological systems?” Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 39 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Sponsors of DREAM Columbian University Center for Multiscale Analysis Genomic and Cellular Networks (MAGNet) NIH Roadmap Initiative IBM Computational Biology Center The New York Academy of Sciences Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 40 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t A Challenge is posted each year: An opportunity for “assessment” DREAM4(2009)- Organized by Gustavo Stolovitzky, IBM Computational Biology Center and Andrea Califano, Columbia University The PDZ challenge – Given the primary sequence of 5 proteins, almost identical, can you predict the binding target of each one. That is, what is the binding affinity difference of each ”mutant”. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 41 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t The Challenge There is a binding pocket in each PDZ domain Mutations in the binding pocket result in a change in the binding affinity Given a mutation, can you predict the change in binding affinity? A single amino acid change can have ”synergistic” consequences Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 42 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Higlighted amino acids are attracted to different amino acids in the target protein Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 43 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Our Approach Debra J. Knisley 1 Use the graph-theoretic models and corresponding graphical invariants of individual amino acids 2 Replace each alphabet string representing the primary structure of each domain (and corresponding target) with a numerical string. The numerical string was determined by the observation of the properties of the amino acids. 3 Train a neural network using these numerical strings as inputs. October 7, 2010 Two recent applications of graph theory in molecular biology 44 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Conclusion “The idea of optimization is intimately connected with modern science. Pioneers like Galileo, Fermat and Newton, were convinced that the world had been created by a benevolent God who had established the laws of nature as the most efficient way to achieve his purposes: in short, this is the best of all possible worlds and it is the task of science to find out why and how. Gradually this view was overturned, leaving optimization as an important tool for the human-engineered world.” Taken from the abstract of Ivar Ekeland, University of British Columbia, for his upcoming Math Matters Lecture on The Best of All Possible Worlds: The Idea of Optimization. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 45 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t Joint work with a number of people at ETSU Jeff Knisley- applied mathematics Teresa Haynes- graph theory Edith Seier-statistics Denise Koessler- Mathematics graduate student Cade Herron and Jordon Shipley- Mathematics undergraduate students – Acknowledgement: Tamar Schlick and the web resource folks at NYU Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 46 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t References G. Benedetti and S. Morosetti, A Graph-Topological Approach to Recognition of Pattern and Similarity in RNA Secondary Structures, Biol Chem. 22 (1996) 179-184. N. Bose and P. Liang, Neural Network Fundamentals with Graphs, Algorithms, and Applications, New York: McGraw-Hill (1996). G. Chartrand, Introductory Graph Theory, Boston: Prindle, Weber & Schmitt (1977). H. Gan, S. Pasquali, and T. Schlick, Exploring the Repertoire of RNA Secondary Motifs Using Graph Theory: Implications for RNA Design, Nucleic Acids Research, 31 (2003) 2926-2943. F. Harary and G. Prings, The Number of Homeomorphically Irreducible Trees and Other Species, Acta Math, 101 (1959) 141-162. T. Haynes, S. Hedetniemi, and P. Slater, Fundamentals of Domination in Graphs, New York: Marcel Dekker, Inc. (1998). D. Knisley, T. Haynes T, E. Seier, and Y. Zou, A Quantitative Analysis of Secondary RNA Structure Using Domination Based Parameters on Trees, BMC Bioinformatics 7 (2006) 108. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 47 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t References D. Knisley D, T. Haynes T, and J. Knisley, Using a Neural Network to Identify Secondary RNA Structures Quantified by Graphical Invariants, Communications in Mathematical and in Computer Chemistry / MATCH, 60 (2008) 277-290. S. Le, R. Nussinov, and J. Maziel, Tree Graphs of RNA Secondary Structures and their Comparison, Comp. Biomed. Res., 22 (1989) 461-473. M. Nelson and S. Istrail, RNA Structure and Prediction Computational Molecular Biology, [http://tuvalu.santafe.edu/ pth/rna.html], (1996). T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, and H. Gan, RAG: RNA-As-Graphs Web Resource, BMC Bioinformatics, 5 (2004) 88. T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, and H. Gan, RAG: RNA as Graphs Web Resource, [http://monod.biomath.nyu.edu/rna/rna.php], (2007). T. Schlick, D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, H. Gan, and M. Tang, RAG: RNA-As-Graphs Database-Concepts, Analysis, and Features, Bioinformatics, 20 (2004) 1285-1291. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 48 / 50 25th Clemson Mini-Conference on Discrete Math and Algorithms t References T. Schlick, N. Kim, N. Shiffeldrim, and H. Gan, Candidates for Novel RNA Topologies, J Mol Biol, 341 (2004) 1129-1144. J. Shao, Linear model selection by cross-validation, J. Am. Statistical Association, 88 (1993) 486-494. S. Vishveshwara, K. V. Brinda, and N. Kannan, Protein Structure: Insights from Graph Theory, Journal of Theoretical and Computational Chemistry, 1 (2002) 187-211. Debra J. Knisley October 7, 2010 Two recent applications of graph theory in molecular biology 49 / 50
© Copyright 2026 Paperzz