PowerPoint **

TMSG
ROAD: A New Spatial Object Search
Framework for Road Networks
Angus Fuming Huang
2012.04.18
AFH
1
Publication
AFH
2
OUTLINE
•
•
•
•
•
•
•
AFH
INTRODUCTION
THE ROAD FRAMEWORK
SINGLE-SOURCE LDSQ ALGORITHMS
MULTISOURCE LDSQ ALGORITHMS
PERFORMANCE EVALUATION
CONCLUSION
Angus Comments
3
Introduction
• Location-based services are booming
– Map and navigation services
– Garmin, GoogleMap, MapQuest, NavTeq, YahooMap
• Location-dependent information
– spatial objects
• Location-dependent spatial queries
(LDSQs)
– Queries that search for spatial objects with respect to userspecified locations
• Q1: find hotels within one mile from
the conference venue
AFH
4
Introduction (contd.)
• Two basic operations for LDSQs processing
• Network traversal
– Visit network nodes/edges according to network proximity
• Object lookup
– Access and check the attributes of objects located at
traversed nodes or edges
• Network is modeled as a graph
– Search space pruning
AFH
5
Introduction (contd.)
• A network is formulated as a set of
interconnected regional subnets called Rnet
– Shortcuts
• Selective paths across an Rnet that enable any traversal to bypass
the Rnet if it has no object of interest
– Object abstract
• The existence and/or contents of objects that are inside the Rnets to
provide quick traversal guidelines
• Two novel index structures
• Route Overlay
– Manage the physical network structure and the shortcuts
• Association Directory
AFH
– Manipulate the mappings of objects and object abstracts on nodes,
edges, and Rnets
6
THE ROAD FRAMEWORK
•
•
•
•
AFH
Preliminaries
Rnet, Shortcut and Object Abstract
Rnet Hierarchy
Route Overlay and Association
Directory
7
Preliminaries
• All LDSQs are assumed to be initiated at nodes
without loss of generality
• In general, each LDSQ is specified with a
distance condition D and an attribute predicate
AFH
8
Rnet, Shortcut and Object Abstract
AFH
9
Rnet Hierarchy
AFH
10
Rnet Hierarchy (contd.)
AFH
11
Route Overlay
• Border nodes in an Rnet are always the border
nodes in some of its child Rnets
• Naturally flattens a hierarchical network into a
plain structure to facilitate search space
expansion over a network
• B+-tree
– All the shortcuts from border n to other border
nodes are captured by nonleaf entries
– A leaf entry stores all the physical edges to its
neighboring nodes
AFH
12
Route Overlay (contd.)
AFH
13
Association Directory
• An efficient object lookup mechanism in ROAD
• B+-tree
– With unique node IDs or Rnet IDs as the
search key
– Associated with node n(n’) are objects o in
L(n, n’) together with their distances
δ(o,n)(δ(o,n’))
• Represent an object abstract with smaller
storage overheads
– Bloom filter[18], signature[19]
AFH
14
Association Directory
• Object o1 on edge (nf, ng)
• Both Rnet R3b and its parent Rnet R3 that
contain objects o1 and o2 are associated with {o1,
o 2}
AFH
15
SINGLE-SOURCE LDSQ ALGORITHMS
• NN query: n2
• Objects: o1, o2
• Border nodes: n3, n5, n7, n9, n11
• Since R3b contains objects, a traversal within R3b is needed
• The search only takes three jumps from n3 to n11
AFH
1
2
3
4
5
6
16
SINGLE-SOURCE LDSQ ALGORITHMS
• Range query, kNN query
• kNNSearch [17]
• ChoosePath [17]
– Quickly identify appropriate shortcuts and edges to
expand the search range from a node n
– Depth-first traversal order
– If n is a border node, the shortcut tree must have
multiple levels
• RangeSearch
– Resembles kNNSearch algorithm except ends when
a portion of the network within the distance bound
AFH
17
MULTISOURCE LDSQ ALGORITHMS
• Concurrent Network Expansion
• Rnet Visited Set and Border Node
Visited Set
• Search Algorithm
• A multisource LDSQ finds objects with respect to m
query nodes
• A multisource kNN query finds k objects whose
maximum distances from all query nodes are the
AFH minimum
18
Concurrent Network Expansion
• Adopt a concurrent approach that expands a
search space from all query nodes through a
best-first strategy
• According to Lemma 2 and 3, the k first visited
objects are guaranteed to be the answer objects
AFH
19
Rnet Visited Set and Border
Node Visited Set
• Two subqueries: q1, q2
• Result object: ob
• !!! The oa and oc will be traversed first
• An Rnet is worth exploring only if it contains objects of
interest and it is reached by all subqueries
AFH – Rnet visited set (RV), Border node visited set (BV)
20
Search Algorithm
• MultiSourcekNNSearch
• Every entry (ε,d,qi) in Priority Queue P
records a node or an object (ε), its distance
from nqi(d) and the respective subquery (qi)
• An entry (R,qi) in RV indicates that Rnet R
has been visited by subquery qi
• An entry (R,b,d,qi) in BV records that a
subquery qi has reached Rnet R via the
border node b, and d=||b,nqi||
AFH
21
To mark nodes
“unvisited by qi”
To repeatedly
evaluates the head
entry from P
has been visited
a detailed examination
begins
associated objects are
fetched from AD and
enqueued to P for
later exam.
To check the
backtracking by BV
To resume the
traversal at the border
nodes
AFH
To expand the search
range
22
• Visit node n’s shortcut tree in a depth-first order and identify
appropriate shortcuts and edges to expand search range
If R is..
To bypass R
AFH
23
RV: (R3a, q2), (R3b, q2),
(R3, q2)
RV: (R1a, q1), (R1, q1)
BV: (R3a, n11, 0, q2),
(R3b, n11, 0, q2)
BV: (R2a, n5, 3, q1)
AFH
BV: (R2b, n9, 2, q2)
24
PERFORMANCE EVALUATION
• Index Construction
• Query Performance
–
–
–
–
Experiments on Single-Source kNN Query
Experiments on Single-Source Range Query
Experiments on Multisource kNN Query
Experiments on Multisource Range Query
• Index Update
• Evaluation on p and l
AFH
25
PERFORMANCE EVALUATION
• Data set
– CA, NA highways in California and North America
– SF, PRS streets and roads in San Francisco and Paris
– 100 to 100000 objects
• Comparison
– NetExp (network expansion [7])
– Euclidean (euclidean distance bound approach [8])
– DistIdx (distance index [6])
– DistBrws (distance browsing [13])
• Performance metrics
– Index construction time
– Index size
– Query processing time
– Index update time
• Evaluation parameters
AFH
26
Index Construction
• Object numbers vs. Index construction time (hours) &
Index sizes (megabyte)
• NA highway
• NetExp & Euclidean incur the smallest index
construction times and index size
• DistBrws takes an extremely long time and huge storage
• ROAD takes around 1 hour and 20 MB
• The ideas of query precomputation and materialization of
shortest paths between nodes or toward objects are not
AFH
appealing
27
Index Construction
• Different networks, 10000 objects, 100 Rnets
• NetExp and Euclidean incur the shortest index time and size
• DistIdx, DistBrws and ROAD incur different index time and
size, but ROAD is the best
• DistBrws takes over a month to build the index and more
than 15GB
AFH
28
• ROAD incurs significantly shorter time and size
Experiments on Single-Source kNN Query
• (a) Euclidean performs the worst because of exhaustive
shortest path searches for a possibly large number of
candidate objects
• (a) DistBrws and DistIdx perform worse due to the excessive
accesses to distance signatures and shortest path quadtrees and slow node-by-node network traversals
• (b) ROAD only requires 33% of NetExp’s processing time
when 100000 objects are evaluated
• (c) When 10 clusters, ROAD takes only 1 percent
processing time of NetExp
• (d) When k is increased, ROAD consistently performs the
best due to its strong pruning power
AFH
29
Experiments on Single-Source Range Query
• ROAD consistently outperforms all the others and it benefits
more from a larger network
• Euclidean performs the worst as it has to examine a large
number of candidate objects
• DistBrws and DistIdx do not improve the search
performance since they both suffer from the massive access
overhead for large networks and large numbers of objects
AFH
30
Experiments on Multisource kNN Query
• (a, b, c, d): two-source kNN queries
• (e): multisource NN queries
• DistIdx and DistBrws do not support multisource
LDSQs
• NetExp performs worse due to exploring all the
subnetworks around query points
• Euclidean has to invoke multiple network traversals
to determine the network distances of candidate
objects
AFH
31
Experiments on Multisource Range Query
• This is because range queries request to explore all the
nodes/edges within the search range, that is independent
of the number of objects.
• As the search range is fixed, the search performance does
not change even when the number of objects varies
• Euclidean performs the worst due to exhaustive candidate
object distance searches
AFH
32
Index Update
• The update cost incurred by DistIdx is several orders of
magnitude higher than that of others
• The edge change has almost unobservable impacts on
NetExp and Euclidean
• For DistIdx, the distance signatures of many nodes need
reexamination and update, resulting in large processing
times
• ROAD only needs to update affected shortcuts of certain
border nodes of Rnets
AFH
33
Evaluation on p and l
• p: child Rnet number
• l: level number
• Single-source kNN queries (k=10)
• ROAD performs similarly in terms of query processing
times under different <p,l> pairs
• A smaller l results in a smaller index and a shorter
construction time
• So~ smaller l and larger p is better !
AFH
34
CONCLUSION
• The on-going trend of web-based LBSs…
– To accommodate diverse objects
– To support different distance metrics
– To process various LDSQs efficiently
• ROAD
– A clear separation between objects and network
for better system extensibility
– Exploit search space pruning
– Support single-source and multisource LDSQs
• In the future…
– To support Continuous queries, Skyline queries
AFH
35
and Optimal location queries
Angus Comments
• Hierarchical road nets concept
• Comprehensive experiments
AFH
36