施智懷56

Learning a Hidden Graph with
Adaptive Algorithms
Hung-Lin Fu and Chie-Huai Shih
National Chiao Tung University
Hsin Chu, Taiwan 30050
Speaker: Chie-Huai Shih
Outline

Motivation

Preliminaries

Two algorithms

Main result

Concluding remarks
1
Motivation
前言:



1990年10月開始,由美國能源部(DOE)及國家
衛生院(NIH)所支持的人類基因體研究計劃
(Human Genome Project,HGP)正式啟動。
HGP主要的目標在於探討人類之基因體組成,
包括改進既有之遺傳圖譜 (genetic map),建
構實質圖譜 (physical map),找出人類所有
的基因,以及決定人類所有的基因體序列。
過去數年來,HGP 已有許多進展,而它所衍生
出之分子遺傳學科技,已被廣泛應用於醫學各
領域。
2

1998年Venter成立美國民間第一個從事人類基
因體計劃的私人公司—Celera,大膽宣稱人類
DNA序列將在三年內耗資3億美元而完全解碼。
3
DNA 定序科技


DNA是由四個最基本的[密碼]所組成的
分別是:A腺嘌玲核甘酸
T胸腺嘧啶核甘酸
C胞嘧啶核甘酸
G鳥糞嘌玲核甘酸
4種核甘酸的鹼基都可以互相配對,也就
是說它們的化學性質和形狀可以互補可以
配對的有A配T,C配G
4
5
Random shotgun approach
genomic segment
cut many times at
random (Shotgun)
6
merge reads into contigs
7
Preliminaries
Models
1) Multi-vertex model
2) Quantitative multi-vertex model
3) k-vertex model
4) Quantitative k-multi-vertex model
The model used in this thesis will be (1) Multi-vertex model.


Learning a hidden graph by edge-detecting queries:
8

Algorithm
 Sequential (adaptive)
Nonadaptive ~ 1 round
Several rounds
9
Algorithm A(V)


Algorithm A(V)
Part 1. FIND_ONE_VERTEX(V)
Part 2. FIND_ONE_EDGE(V)
Lemma 1. Algorithm A (V) finds an
arbitrary edge in G[V] using at most 2log(n)
edge‐detecting queries where n is the size
of V.
10
Example
G:
3
4
7
8
1
2
5
6
11
Example
Q({1,2,3,4,5,6,7,8}) = 1
Part 1.
3
4
7
8
1
2
5
6
12
Example
Q({1,2,3,4}) = 0
Part 1.
3
4
7
8
1
2
5
6
13
Example
Q({1,2,3,4,5,7}) = 1
Part 1.
3
4
7
8
1
2
5
6
14
Example
Part 1.
= 12, 3, 4}
v =Q({1,2,3,4,5})
{5}, S \ {v} = {1,
3
1
4
2
7
8
v
5
6
5
Part 2.
Q({5,1,2})
Q({5,3}) ==10
Find 53-edge
1
2
3
15
4
Algorithm B(v, I)


Lemma 2. Algorithm B (v, I) identifies all the
edges between vertex v and independent set
I using no more than 2slog(n) edge‐detecting
queries where n is the size of I and s is the
number of edges between v and I.
NOTE. An average cost of edges between v
and I in Algorithm B (v, I) is 2log(n) edgedetecting.
16
Example
5
1
2
v
3
4
I
17
Lower bound

Theorem 3. For any
,
edge-detecting queries are required to
identify a graph drawn from the class of all
graphs with vertices and
edges.
Proof.
18

Theorem 4. For any adaptive algorithm in
model 1,
edge-detecting queries are
required to identify a graph drawn from the
class of all graphs with vertices and
edges.
Main result(adaptive algorithm)

Theorem 5. There exists an adaptive
algorithm that learns a general graph with
n vertices and m edges using at most
m(2log n + 9) queries.
19
Main result(adaptive algorithm)
How to reconstruct a general hidden graph G = (V,
E)??
Note here that if there are edges between two
independent sets, we may find all of the edges by
using Algorithm B (v, I).



Algorithm 1. MAXIMAL_MATCHING(V)
Algorithm 2. PARTITION_OF_VERTEX_SET(V)
Algorithm 3. HIDDEN_GRAPH(V)
20
Example
PARTITION_OF_VERTEX_SET(V)
HIDDEN_GRAPH(V)
G: 3
1
4
2
7
5
8
8
6
3
27
4
1
45
2
6
MAXIMAL_MATCHING(V)
Algorithm A({1,2,3,4,5,6,7,8})
1
3
Algorithm A({2,4,5,6,7,8})
2
4
Algorithm A({5,6,7,8})
5
7
Q({8,6}) = 0
21
Complexity

The number of queries less than 2m(log n + 9)
Proof. by Lemma 1 & Lemma 2
Algorithm 1.
Line
Number of queries
2
3
total
22
Algorithm 2.
Line
Number of queries
2
3
total
Algorithm 3.
Line
Number of queries
1
7
14+17
0 (all of queries be answered
in algorithm 2. , 10th line)
15+18
26
total
23
Concluding remarks


Reduce the rounds of Algorithm 1 (i.e.,
obtain an efficient algorithm to find a
maximal matching).
Learning a hidden graph in Quantitative
k-multi-vertex model.
24
References






[1] N. Alon, R. Beigel, S. Kasif, S. Rudich,and B. Sudakov. Learning a hidden matching,
The 43rd Annual IEEE Symposium on Foundations of Computer Science, 197–206,
2002.
[2] D. Angluin and J. Chen. Learning a hidden graph using O(log n) queries per edge.
Manuscript, 2006.
[3] D. Angluin and J. Chen. Learning a hidden hypergraph of Machine Learning
Research 7, 2215-2236, 2007.
[4] R. Beigel, N. Alon, S. Kasif, M. S. Apaydin and L. Fortnow. An optimal procedure for
gap closing in whole genome shotgun sequencing, In RECOMB, 22–30, 2001.
[5] V. Grebinski and G. Kucherov. Optimal query bounds for reconstructing a
Hamiltonian cycle in complete graphs, In fifth Israel symposium on the Theory of
Computing Systems, 166-173, 1997.
[6] V. Grebinski and G. Kucherov. Reconstructing a Hamiltonian cycle by querying the
graph: Application to DNA physical mapping. Discrete Applied Math., 88(1-3): 147–
165, 1998.
25
MegaBACE 4000
Thank you for your attention!
26