情報生命科学特別講義 - Kyoto University Bioinformatics Center

九大数理集中講義
Comparison, Analysis, and Control of
Biological Networks (7)
Partial k-Trees, Color Coding, and
Comparison of Graphs
Tatsuya Akutsu
Bioinformatics Center
Institute for Chemical Research
Kyoto University
Tree Decomposition and
Partial k-Tree
[Flum, Grohe: Parameterized Complexity Theory, Springer]
Tree Decomposition
Tree decomposition of G(V,E)
Pair of rooted tree and family of sets of vertices
(T (VT , ET ), ( Bt )tVT )
all v∊V , B 1 (v)  {t  VT | v  Bt } is connected
For all {u,v}∊E , u, v∊Bt holds for some t∊VT
For
Width
maxt |Bt|-1
Treewidth
Minimum width
of possible tree
decompositions
Examples
⇒ treewidth of tree is 1
⇒ treewidth of cycle is 2
Several Properties
Prop. Let t1,…,th be children of node t in T(VT,ET).
For all i≠j, ( Bt  Bt )  ( Bt  Bt )  
i
j
Prop. Let s be parent and t1,…,th be children of node t.
For all j, ( Bs  Bt )  ( Bt j  Bt )  
⇒ Many optimization problems can be solved in a bottom up manner
Thm. Graphs with treewidth k is partial k-tree,
and treewidth of partial k-tree is k
Thm. Determination of treewidth is NP-hard
Thm. For fixed k, tree decomposition of partial k-tree can
be computed in linear time
Definition of partial k-tree is omitted.
DP Algorithm for Partial k-Trees
For fixed k, many NP-hard problems can be solved in
polynomial time using dynamic programming
Ex. Vertex cover problem


Ch(t): Set of children of node t in tree T
Wt Ws  Wt  ( Bt  Bs )  Ws  ( Bt  Bs )
Dynamic
programming algorithm
OPTt (Wt ) | Wt | 
 min OPT (W ) | W  W

s
s
s
t
W
:
W

W
sCh ( t )  s s t
|

OPT  min OPTr (Wr )
Wr

where Wt is a vertex cover for a subgraph induced by Bt,
r is the root of T.
Explanation of DP Algorithm
OPTt (Wt ) | Wt | 
 min OPT (W ) | W  W

s
s
s
t
Ws :Ws Wt

sCh ( t )
OPT  min OPTr (Wr )
OPTt(Wt): size of minimum
vertex cover of G(t) under the
condition that Wt is cover of Bt
T(t): subtree of T induced by t
and its descendants
G(t): subgraph of G induced by
Wr
Bt

t 'V (T ( t ))
Bs
|

Bs’
B 1 (t )
Wt :

Ws :

Ws ' :



Analysis of Time Complexity
Let k be a constant.
Tree decomposition can be computed in linear time.
For each t∊VT, at most 2k+1 Wt are tested.
To compute min in Σ, 2k+1× 2k+1 =4k+1 pairs are tested
per edge in T
Thus, the total complexity is O(4k poly(n)).
OPTt (Wt ) | Wt | 
 min OPT (W ) | W  W

s
s
s
t
Ws :Ws Wt

sCh ( t )
OPT  min OPTr (Wr )
Wr
|

Applications to Bioinformatics

Graphs representing structures of proteins
and RNAs are considered to have small
treewidth
Examples
 Protein threading
 Protein side-chain packing
 Protein structure alignment
 Comparison of RNA secondary structures
 Attractor detection in Boolean networks
Color Coding
[Alon et al.: J. ACM 1995]
k-Path Problem
Input: undirected graph G(V,E), integer k
Output: vertex disjoint path of G with length k


NP-hard ⇐ Hamilton path problem if k=n(=|V|)
Naïve algorithm: For each vertex v, examine neighbors,
neighbors of neighbors, …
⇒ O(nk) time
Idea


Partition V into k subsets (color vertices using k colors)
If lucky, all vertices lie in different subsets
(analysis of such probability ⇒ randomized algorithm)
DP Algorithm

For each v , examine whether there exists k-path starting
from v

Path can be reconstructed by traceback
P(u,C): 1 if there exists a path from v to u using each color in C
exactly once, otherwise 0 (C is a subset of {1,2,…,k})
Initialization: P(v,{f(v)})←1, others be 0 (f(v) is color of v)
Recursion: (in the order of |C|=1 to |C|=k-1) {u,w}∈E
P(u, C  { f (u )})  1  P( w, C )  1 and f (u )  C
P(v,{R})=1
v
P(w,{R,Y,B})=1
w
u1
u2
P(u1,{R,Y,
B,G})=1
Analysis of Time Complexity
P(u,C): 1 if there exists a path from v to u using each color
in C exactly once, otherwise 0 (C is a subset of {1,2,…,k})
Initialization: P(v,{f(v)})←1, others be 0 (f(v) is color of v)
Recursion: (in the order of |C|=1 to |C|=k-1) {u,w}∈E
P(u, C  { f (u )})  1  P( w, C )  1 and f (u )  C
Lemma: The above algorithm works in O(2k poly(n))
time
Proof: Numbr of C is 2k. Thus, it is enough to examine
2kn P(u,C)s.
This computation should be done for all initial vertex
v, which needs additional O(n) factor
Analysis of Success Probability
Lemma: Let P be k-path of G. When randomly coloring, the
probability that k vertices in P have different colors is ≧ e-k
Proof: #coloring to P is kk. On the other hand, #(successful
coloring) is k!. Therefore, by using Stirling formula, we have
k!
2k (k / e )
2k
k

 k e
k
k
k
k
e
k
k
Theorem: By repeating the algorithm at least ek times, a
solution can be obtained (if any) with probability ≧ 1/2
Proof: The probability of all fails is bounded by
 k ek
(1  e )  e 1  12
The algorithm never outputs a wrong solution
Derandomization
Idea: use of hash function families
k-perfect hash functions: Let F be a family of hash functions
from V={1,2,…,n} to {1,2,…,k} . F is called a family of kperfect hash functions if, for any k-element subsets of V,
there exists a function f∊F that gives one-to-one mapping
Theorem: For any n and k, k-perfect hash functions with
2O(k)・log2n functions can be constructed in 2O(k)・n・log2n time
⇒ In place of random coloring, it is enough to examine all f
given by this theorem
Corollary: k-Path Problem can be solved in 2O(k)・poly(n) time
Applications of Color Coding
`Path’ is color coding can be extended to small
trees and small subgraphs (network motifs)
⇒ Applications to bioinformatics

Network motif [Alon et al.: Bioinformatics , 2008]

Signal pathway analysis [Huffner et al.: Bioinformatics 2007 &
Algorithmica 2008]


Network marker [Dao et al.: Bioinformatics 2011]
Pathway search/alignment [Shlomi et al.: BMC Bioinformaics
2006]
Comparison of Chemical Graphs
Chemical Structures and Graphs

Tree


graph without cycle
Almost tree

tree + some edges
(in each strongly connected component)

Outerplanar graph



No crossing edges
No internal vertex
Partial k-tree

Decomposed into tree
by identifying k+1
vertices as one node
Partial k-trees

Partial k-tree(tree width≦k)



Decomposed into tree by identifying k+1 vertices as one node
Outerplanar graphs are 2-trees
Chemical
compounds in
NCI database
[Horvath & Ramon, TCS
2010]
tree
width
1 (tree)
21,950
2
221,675
3
6,548
≧4
65
If we can design efficient algorithms for partial 4-trees,
we can cover almost all chemical compounds
Three Matching
Problems

Graph isomorphism


Subgraph
isomorpshim


Are two graphs are
essentially the same ?
Is one graph a part of
the other graph ?
Maximum common
subgraph

Largest (connected)
common part between
two given graphs
Complexity of Graph Comparison Problems

Graph isomorphism



Polynomial time for bounded degree graphs [Luks, JCSS, 1982]
However, not practical because the algorithm is too
complicated (based on group theory)
Subgraph isomorphism

Polynomial time for partial k-trees of bounded degree
[Matousek & Thomas, Disc. Math., 1992]


However, the algorithm is still too complicated
Maximum common subgraph





trees:polynomial time [Matula, Ann. Disc. Math, 1978]
almost trees: polynomial time [Akutsu, IEICE Trans., 1993]
outerplanar graphs: polynomial time [Akutsu & Tamura, Algorithms, 2013]
partial k-trees: NP-hard for k=11 [Akutsu & Tamura, Proc. ISAAC 2013]
partial k-trees with k=3: open problem (since we recently
improved to k=4)
Algorithm for Outerplanar Graphs: Key Idea

Difficulty: need to find cut points ⇒ easily lead to
combinatorial explosion

Idea: introduction of the concept of blade
Lemma: #blades is O(n2). ⇒ polynomial time algorithm

Maximum Common Subgraph: Summary

Trees


Almost trees


polynomial time [Akutsu, IEICE Trans.,1993]
Outerplanar graphs of bounded degree


polynomial time [Matula, Ann. Disc. Math, 1978]
polynomial time [Akutsu & Tamura, Algorithms, 2013]
Partial k-trees of bounded degree
NP-hard [Akutsu & Tamura, Proc. ISAAC 2013]
⇔ Polynomial time for subgraph isomorphism

[Matousek & Thomas, Disc. Math., 1992]
Summary

Tree Decomposition



Color Coding



For fixed k, many NP-hard problems can be solved in
polynomial time by DP algorithms
Applications to analysis of protein/RNA structures
Useful for finding small paths/subgraphs in networks
Applications to biological pathway analysis
Comparison of Chemical Graphs

The maximum common subgraph problem is NP-hard
even for partial k-trees for k=4, but is solvable in
polynomial time for outerplanar graphs