Graph - Lehrstuhl für Effiziente Algorithmen

Basic Data Structures for
Graph based Visualization and
Analysis of Metabolic Networks
Jan Griebsch & Arno Buchner & Hanjo Täubig
Lehrstuhl für Effiziente Algorithmen
Prof. E.W. Mayr
Institut für Informatik, TU München
BFAM Workshop
16.-17.01.2004
Outline
• Application Requirements
• Related Work
 Graph Concepts
 Existing Software
• Conclusions for Data Models
• A Test Case
User-defined Requirement Profile
• Work with (metabolic) networks including up to several
1000 nodes (reactions, substrates)
 Store arbitrary context information for each node
 Search for/filter/extract enzymes, metabolites,
pathways/subnetworks according to user-defined
criteria
• Visualization of such networks
 Support expand / contract meta-/nodes
• Enable the use of graph algorithms efficiently
• Accommodate abstractions such as clusters of nodes
Compound Graphs
Definition A compound graph C = (G,D) consists of a
graph G = (V,EG) and a directed acyclic graph
D=(V,ED) that share the same set of nodes.
[Sugiyama and Misue 1991]
Clustered Graphs
Definition A clustered graph C = (G, T) consists of a base
graph G and a rooted tree T, such that the leaves of T
are exactly the vertices of G.
[Eades and Feng, 1996]
Graph Views Concept
Definition Let G = (VG,EG) be the base graph. The
hierarchy is defined by the tree T = (VT,ET), with the
leaves L(T) = VG. A view is defined as a subset of VT that
induces a partition of VG.
[Buchsbaum and Westbrook 2000]
Existing Graph Software
Software/Libraries
 LEDA, Boost, GTL
 no concept of hierarchies
 Wilmascope, GVF
 Clusters
 no concept of views
Graph Class Diagram
Example: WilmaScope
GraphElement
Data
+ data : Data
+ id : Integer
+ name : String
+ redraw () : void
+ toString () : String
GraphNode
# edges : (Vector)
<<constructor>> + GraphNode (data : Data )
<<getter>> + getEdgesIterator () : Iterator
+ addEdge (edge : Edge) : void
+ removeEdge (edge: Edge) : void
<<constructor>> + Data (id: int)
<<constructor>> + Data (id: int, name : String)
<<getter>> + getID () : Integer
<<setter>> + setName (name: String) : void
Edge
+ startNode : GraphNode
+ endNode : GraphNode
+ directed : boolean = false
GraphControl
<<constructor>> + Edge (data : Data)
<<setter>> + setStartNode (node : GraphNode) : void
<<setter>> + setEndNode (node : GraphNode) : void
+ id : Integer
+ name : String
<<constructor>> : + Data (id: int)
<<constructor>> : + Data (id: int, name: String)
<<getter>> : + getID () : Integer
<<setter>> : + setName (name: String) : void
Graph
ClusterNode
- members : Vector
- internalEdges : Vector
<<constructor>> ~ClusterNode (id : int)
<<getter>> + getNodes () : Vector
+ containsNode : boolean
+ addNode (node : GraphNode) : void
+ removeNode (node : GraphNode) :void
+ addInternalEdge (edge : Edge) : void
# clusters : Vector
# nodes : Vector
# edges : Vector
<<constructor>> + Graph ()
<<getter>> + getNodes () : Vector
<<getter>> + getParentClusters (node : GraphNode) : Vector
+ containsNode : boolean
+ addNode (node : GraphNode) : void
+ removeNode (node : GraphNode) : void
+ addEdge (edge : Edge) : void
+ removeEdge (edge : Edge) : void
+ createCluster (clusteredNodes : Vector) : ClusterNode
Biochemical Visualisation and Analysis
Framework for Metabolic Networks (BVAM)
GUI
General User Interface
Graph
Analysis
Tools
Graph
Moses
(CCC Group)
Data Exchange
Layer
BioPath
Database
(CCC)
Graph
Visualisation
Tools
KEGG
Datasources
BRENDA
WIT
Class Diagram
GraphElement
GraphElement
Graph
View
Hierarchy
Node
Relation
Edge
Graph
View
Hierarchy
+ PropertyMap
#
#adj
view(hierarchy&)
: List<Edge>
: HashMap
#nodes
+
expand(node)
hierarchy(Graph&)
: List<Node>
+ add_node(Node)
GraphElement()
collapse(node)
+ remove_node(Node)
add_attribute(keytype, valuetype)
Graph()
+ father_edge()
remove_attribute(keytype)
add_node(Node)
: Edge
+ son_edges()
has_attribute(keytype)
add_edge(Node,
: edge_iterator
Node): bool
+ is_predecessor(Node,
get_value(keytype) : value
remove_node(Node)
Node) : bool
+ induced_edge(Node,
remove_edge(Edge) Node) : bool
Class Diagram
GraphElement
Node
Relation
Edge
Graph
View
Hierarchy
How could arbitrary many
hierarchies and views be
modelled ?
Observer Pattern
View and Hierarchy are
updated through callbacks
[Raitner, 2003].
Class Diagram
GraphElement
Node
Relation
Edge
Graph
Observer
ObservedGraph
+ ObservedGraph()
+ add(Observer&)
+ remove(Observer&)
Hierarchy
1
1..* + Observer(ObservedGraph&)
+ add_node_handler(Node)
+ remove_node_handler(Node)
+ add_edge(Node, Node)
+ remove_edge(Edge)
View
Space Time Trade-offs
• Induced edges are calculated when needed
 No redundant information
 Expand/contract worst case: O(|EG|+ |VG|)
 Quick edit operations on the base graph
• Induced edges are stored
[Buchsbaum and Westbrook, 2000]
 Expand/contract in optimal time
 Space required: O(|EG|D2)
 Updates of the base graph are more complicated and
need O(D2) expected time
• Is there a good compromise ?
Example: Constructing Hierarchies on
Metabolic Networks
• Motivation
 Explore properties of hierarchies on metabolic networks
 Test prototype implementations
 Study the question to what extend metabolic networks can be
said to be composed of distinct sub- and sub-subnetworks
(Betweenness centrality could also be used for detecting key
reactions/enzymes)
• Previous work
 Large-scale organization of metabolic networks
[Jeong et al., 2000]
 The small world inside large metabolic networks
[Fell and Wagner, 2001]
 Exploring the pathway structure of metabolism
[Schuster et al., 2002]
 Subnetwork hierarchies of biochemical networks
[Holme et al., 2002]
 Hierarchical analysis of dependency in metabolic networks
[Gagneur et al., 2003]
Example: Constructing Hierarchies on
Metabolic Networks
• Data
 BioPath Database, Computer Chemie Centrum, Prof. Gasteiger
• Decomposition
 successively delete nodes according to a global centrality
measure
Definition Betweenness Centrality
CB 
 
 mm ( r )
 mm
mM mM \ m
w ith
r
a reaction
M
the set of substrates
 mm
totalnumber of shortestpaths
 mm (r) number of shortestpaths passing through r
Example: Constructing Hierarchies on
Metabolic Networks
• Deleted Metabolites
Metabolite
Betweeness
PROTON
(5.31227e+06)
WATER
(5.2434e+06)
ATP
(3.04506e+06)
NAD
(2.03219e+06)
NADP (reduced)
(2.26506e+06)
COENZYME A
(1.75499e+06)
NADP
(1.83248e+06)
NAD (reduced)
(2.23032e+06)
PYROPHOSPHATE
(2.04188e+06)
ADP
(2.08268e+06)
CARBON DIOXIDE
(1.89917e+06)
PHOSPHATE (with GTP)
(2.16805e+06)
L-GLUTAMATE
(1.78612e+06)
ACETYL-COENZYME A
(1.71162e+06)
AMP
(1.9995e+06)
PYRUVATE
(1.29878e+06)
GLYCINE
(1.31198e+06)
AMMONIA
(1.18789e+06)
2-OXOGLUTARATE
(1.19791e+06)
PHOSPHATE (protonated)
(1.14856e+06)
FAD (linked with enzyme)
(1.2225e+06)
OXALOACETATE
(1.55343e+06)
SUCCINYL-COENZYME A
FORMATE
L-SERINE
L-METHIONINE
UTP
GLYCERALDEHYDE 3-PHOSPHATE
UDP
PALMITOYL-ENZYME
Initial Graph: 3548 Nodes
8956 Edges
Example: Constructing Hierarchies on
Metabolic Networks
Screenshots
Graph including all Biopath reactions
(Random Layout with LEDA)
Screenshots
Graph after deleting of 30 most central Metabolites
(Spring embedded 3D Layout by LEDA)
Screenshot
Graph with Data from Citrate-Cycle
Random Layout, manually changed
Future Work
• Implement graphical user interface
• Include more data sources
• Add chemical analysis abilities:
Interact with MOSES, Prof. Gasteiger
• Visualization
Thanks
We want to thank
Prof. E. W. Mayr, Dr. Jens Ernst,
Klaus Holzapfel and Moritz Maass
for ideas and discussion
and Hanjo Täubig for practical support.
References
[1] Buchsbaum, A. L. and Westbrook, J.R..
Maintaining Hierarchical Graph Views. 11th ACM-SIAM Symposium on Discrete
Algorithms, 2000.
[2] Eades, P. and Feng, Q.W.
Multilevel Visualization of clustered graphs.
Proc. Graph Drawing, LNCS, Vol. 1190, 101-112, Springer Verlag
[3] Sugiyama, K. and Misue, K.
Visualization of structural information: Automatic Drawing of Compound
Digraphs.
IEEE Trans. Systems, Man and Cybernetics, 21(4), 876-892.
[4] Brandes, U.
A Faster Algorithm for Betweenness Centrality.
Journal of Mathematical Sociology, 25(2): 163-177, 2001
[5] Gagneur, J., Jackson, D. B. and Casari, G.
Hierarchical analysis of dependency in metabolic networks.
Bioinformatics, Vol. 19, 2003
[6] Schuster, S. , Pfeiffer, T., Moldenhauer, F., Koch, I. And Dandekar, T.
Exploring the pathway structure of metabolism: decomposition into
subnetworks and application to Mycloplasma pneumoniae
[7] Holme, P., Huss, M. and Jeong, H.
Subnetwork hierarchies of biochemical pathways
[7] M. Raitner
A Library for Hierarchies, Graphs and Views
Visualization using Hierarchies
• Example taken from EcoCyc (http://ecocyc.org/)
 Useful for getting overview
 Limitations here: Only few levels and global change of detaillevel
Analysis using Hierarchies
Recent approaches:
• Holme et al., 2002
 Detecting subnetwork hierarchies of biochemical
networks using the betweeness centrality of reactions
• Gagneur et al., 2003
 Analysis of hierarchical dependencies of subnetworks
using connectivity ranking of metabolites
• Schuster et al. , 2003
 Decomposition of the metabolic network using
connectivity ranking for metabolites
Resulting Data model
Bipartite hierarchical Graph
 Bipartite node structure for different representation of
reactions and metabolites
 Specific information can be labelled to the respective
graph element (and used for analysis and visualization)
 Graph specific algorithms can be implemented for
calculation (pathway searches, statistics)
 Biological concepts (pathways, cell compartments) can
be modelled and visualized using hierarchical structures
 Hierarchical clustering approaches with different criteria
can be used for automated network decomposition
Wrapper Concept
IDNameSet1,
IDNameSet2, …
“Tell_IDSets”
Wrapper
Datasource
Wrapper Concept
Edgeset
<IDValue1, IDValue2,
IDName1,IDName2>
: Set
“Build( IDNameSet )”
Wrapper
Datasource
Wrapper Concept
“Tell_PropertySet
( IDName )”
PropertyNameSet
Wrapper
Datasource
Wrapper Concept
“Get_PropertySet
( IDValueSet, PropertyName )”
Propertyset
<IDValue,
PropertyValue> : Set
Wrapper
Datasource
Graph Concepts and Software
• Definitions
 Clustered Graphs, [Eades and Feng, 1996]
 Compound Graphs, [Sugiyama and Misue 1991]
 Hierarchical Graph Views, [Buchsbaum and Westbrook
2000]
• Software/Libraries
 LEDA
 Boost
 GTL
 Wilmascope
 GVF
 HGV
Supported Graph Operations
• Navigation/View
 Expand a node
 Contract nodes
• Structure
Base Graph
 Insert a new edge between two nodes s,t
 Delete an edge
 Insert a new node
 Delete a node
Hierarchy
 Insert a new step into the hierarchy
 Remove a step from the hierarchy