An Edge-Based Framework for Fast Subgraph Matching in a Large

An Edge-Based Framework for Fast
Subgraph Matching in a Large Graph
Sangjae Kim
Inchul Song
Yoon Joon Lee
DASFAA 2011
1
Outline
 Introduction
 Preliminaries
 Pre-processing
 Filtering
 Verification
 Evaluation
 Conclusions
2
Introduction
 Graphs are useful to represent structured, complex data,
they have been used in many application areas such as Web,
social networks, communication networks, bioinformatics,
ontology engineering, software modeling, VLSI reverse
engineering, etc.
 Existing approaches mainly focus on reducing the input size
to subgraph isomorphism testing.
 Both of these methods are a vertex-based framework in the sense
that they use only vertex information to filter out unqualified
vertices.
3
Introduction
 Edge information also can be used in the filtering process.
 In this paper, we propose an edge-based framework for fast
subgraph matching in a large graph.
4
Preliminaries
 Definition 1.
Vertex-labeled graph. A vertex labeled graph is
denoted as G=(V, E, L, l), whereV is the set of vertices, E⊆V×V
is the set of edges, L is the set of vertex labels, and l is a mapping
function:V → L.
5
Preliminaries
 Definition 2.
Subgraph isomorphism. Given two graphs G = (V, E, L, l)
and G’ = (V’, E’, L’, l’), G is subgraph isomorphic to G’, if there exists an
injective function f: V →V’ such that
1. ∀v ∈ V, l(v) = l’(f(v))
2. ∀ (u, v) ∈ E ⇒ (f(u), f(v)) ∈ E’
Such an injective function is called a subgraph isomorphism mapping.
6
Preliminaries
假設G1=(V1, E1)與G2=(V2, E2)各為無向簡單圖,
若存在一個函數f:V1→V2且滿足:
(1) f 為one-to-one and onto (一對一且映成函數)
(2) ∀a, b∈V1, {a, b}∈E1 ⇔ {f(a), f(b)}∈E2
則稱此f 為同構函數(isomorphism)
7
Preliminaries
 Vertex Signatures
 Edge Signatures
8
v : vertex of query graph
u : vertex of database graph
Pre-processing
9
Filtering
 In the filtering phase, the main task is to find candidate
vertices of each query graph vertex from the database graph.
 Two advantages :
1.we reduce time to retrieve candidate vertices by using
E-Index, a pre-constructed index structure.
2. since E-Index stores information on vertex pairs that
are directly connected to each other, we can retrieve
only those candidate vertices that are directly connected
to each other from E-Index.
10
Filtering
 Endpoint vertices of the candidate edges will be our
candidate vertices. To this end, we need to find candidate
edges of the edges in the query graph. Note that we do not
need to find candidate edges of every edge in the query graph.
This is because we need only those edges that are enough to
cover every vertex in the query graph.
Here a spanning tree of the query graph is useful.
11
Filtering
 Selecting a Spanning Tree
We take the degree sum of each edge as its weight, compute
the maximum cost spanning tree, and use the resulting tree
to retrieve candidate edges.
12
Filtering
 Discovering Candidate Vertices
First, for the first edge e1 = (v1, v1’) in the spanning tree,
we probe L-Index by using the key (l(v1),l(v1’)) and obtain a
pointer to a D-Index and then perform a range query over the
D-Index to retrieve the list of candidate edges.
Compare neighbor information of the candidate edges with
that of the query edge to find the final candidate edges.
13
Verification
 vertex ordering
maintain a visited vertex set, denoted Visit.
start with a vertex with the smallest candidate size and add it to Visit.
 connection-aware forward checking
14
Verification
 The FastMatch Algorithm
15
Verification
 The GetQualifiedCandidateVertices Function
16
Evaluation
Experimental results over various query graph sizes
17
Evaluation
Experimental results over various average query graph degrees
18
Conclusions
19

Download Report

An Edge-Based Framework for Fast Subgraph Matching in a Large

Paperzz.com

Your Paperzz