slide

Top-k Spatial Joins
Po-Sung
[email protected]
1
Survey
What’s top-k spatial joins
2
What’s top-k spatial joins
Map overlays incur high execution cost
Retrieve the k objects
B
A
The processing of such this is expensive
3
Top-k Spatial Joins
Apply a conventional spatial join
algorithm on the two data sets A and B
Count the number of output pairs in
which each object participates
Return the k objects with the maximum
intersection counts
4
Top-1 join
< id , count , IL >
{A1 , 3 , [B1 , B2 , B3]}
B1
A1
{a1 , 3 , [b1 , b5 , b10]}
b1
B3
b5
a1
b10
A2
B2
5
Definition 1
maxnum(e)  C
e.level
E is an intermediate entry of Ra
C the node capacity
e.level the level of the node that contains e
Upper bound maxnum(e)
for the number of objects in the subtree of e
maxnum( A1 )  5
6
Definition 2
count (e) 

ei Rb and ei intersects e
maxnum(ei )
If e is leaf entry of Ra
• the number of objects of Rb that intersect
If e is intermediate entry
• upper bound of the actual count of any object in e
count ( A1 )  3  5  15
7
Example
A1.IL
A2.IL
B1.IL
B5.IL
= [B1 , B2 , B5]
= [B5]
A1
= [A1]
= [A1 , A2]
B1
B5
A2
B2
8
Example (cont.)
Heap H
E : <e , count , list>
e is the entry (of Ra or Rb)
count (e)  
list is e.IL
ei  e.IL
maxnum(ei )
9
Example (cont.)
a1.IL= [b1 , b5 , b10]
a1.key=3
A2.IL= [b5]
a2.key=1
A1
B1
a2 b5
b1
a1
B3
b10
A2
B2
10
Pseudocode
•For each
ei  i.IL
TS
(Rtree R , Rtree Rb, int by
k) e
•Join n and na // n is pointed
i
i
i
Join RTa and RTb to get intersecting pair (ea,eb)
•For each intersecting entry pair(e’, e’i) //
For each entry e that appears in a pair
build e’
e.IL,
compute e.count and insert
•Add
i to r’.IL
<e, ecount, e.IL>to a heap H (sorted by e.count)
•Compute e’.count
While number of reported objects < k
•If e’.count
> pruning condition
• e = de-heap(H)
• If e is a leaf
// actual
object found so far
//ie...count
ofentry
the k-th
best object
– Report (<e.id, e.count, e.IL>)
•Insert <e’, e’.count, e’IL> to H
• Else // e is an intermediate entry pointing to node n
•If e’ is a leaf entry //object
•Update pruning condition
•return
11
Algorithm
Visiting order
count
Pruning
condition
12
Multiple Expansions Method (ME)
13
Two binary search trees
14
Full join VS. Semi join
15
Comparison environment
1. MCB x LA returns 16,477,244 intersection pairs
2. SKEW x LA returns 19,657,973 intersection pairs
16
Node accesses versus k (full join).
17
CPU time versus k (full join).
18
Total cost versus k
(full join, 10 percent cache).
19
Node accesses versus k (semijoin).
20
CPU time versus k (semijoin).
21
Total cost versus k
(semijoin, 10 percent cache).
22
Conclusions
Bottom-k queries
Top-k distance (semi) join
Top-k nearest neighbor (semi) joins
Computing the NN (in A) of all objects of B
Sorting the resulting pairs (ob, oa) where oa
the NN of ob  B with respect to oa
Reporting the top-k objects of A
A
23
in