Slides - University of Melbourne

Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
K-Nearest Neighbor Temporal Aggregate
Queries
Yu Sun †
Jianzhong Qi †
† Department
Yu Zheng ‡
Rui Zhang †
of Computing and Information Systems
University of Melbourne
‡ Microsoft
Research, Beijing
March 26th 2015
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Outline
1
Motivation and Related Work
Motivating Examples
Previous Work
2
Query kNNTA and Index TAR-tree
Definition of kNNTA
Structure and Usage of TAR-tree
3
Entry Grouping Strategies
The Proposed Strategy
Analysis of Different Strategies
4
Experiments and Conclusion
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Motivating Examples
Previous Work
Examples in Our Daily Life
Find some walking-distance
attractions
Find a nearby club gathering lots
of people now
Find a good restaurant not far
away and has few customers now
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Motivating Examples
Previous Work
Answering Such Questions Has Wide Applications
Foursquare or Facebook: places
nearby
Flickr or Instagram: photos taken
nearby having many Likes
Urban computing
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Motivating Examples
Previous Work
No Existing Queries Can Effectively Answer Such
Questions
Ranking locations on
Spatial distance
Temporal aggregate on visits or likes
Range aggregate does not work
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Motivating Examples
Previous Work
No Existing Algorithms or Indexes Can Efficiently
Support Such Rankings
Characteristics of such applications
Visits or likes arrive continuously
Interested periods range from hours to years
Explore results of different preferences
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
A Weighted Sum of the Spatial Distance and Temporal
Aggregate
The ranking function
f (p) = αd(p, q) + (1 − α)(1 − g(p, Iq ))
K-Nearest Neighbor Temporal Aggregate Queries
(kNNTA): returns the k locations with the minimum
ranking scores
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
A Query Example
a
d
t0 →
q
e
b
c
a
f
e
f
i
g
j
h
k
l
l
1
t1 →
1
...
1
1
3
5
...
1
0
t2 →
0
0
4
1
Query: q, [t0 , now], α = 0.3, k = 1
f (e) = 0.3 ·
f (f) = 0.3 ·
2
2.24
+ (1 − 0.3) · (1 − 12
) = 0.626
15.6
3
+ (1 − 0.3) · (1 − 12
) = 0.058
15.6
12
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
A Straightforward Approach May Encounter Very High
Cost
Number of locations in Foursquare is 60 million
Number of records for each location is 525, 600
Much more for applications like Instagram or Twitter
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
A Brief Reminder of R-tree
R6
a
R2
d
R7
e R4
b
f
c
g
j
l R1
R3
h
i
R5
k
R6 R7
R1 R2
c
g
l
a
b d
R3 R4 R5
h
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
j
f
e
i
k
K-Nearest Neighbor Temporal Aggregate Queries
Basic Structure of TAR-tree
a
d R3
b
c R2R1
f
g
j
l
Temporal Indexes (2,3,2)
(3,5,4)
f
(3,5,4)
(1,*,1)
(2,3,1)
e
i R7
h R4 k
R6 R7
R1 R2 R3
c g b
(2,2,2)
R5
R6
R4 R5
a e d
h k i
l
j
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
kNNTA Query Processing using TAR-tree
Best-First Search:
R6
(0.000)
R7
(0.427)
R1
(0.058)
R2
(0.352)
R3
(0.408)
R7
(0.427)
f (0.058) R2
(0.352)
R3
(0.408)
R7
(0.427)
(2,3,2)
(3,5,4)
f
(3,5,4)
R1 R2 R3
R6 R7
(2,3,*)
c g b
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
(2,2,2)
R4 R5
a e d
h k i
l
j
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
Maintenance of TAR-tree
Insert Visits or Likes
Insert location
Re-Insert
Note split
R6 R7
R1 R2 R3
f
c g b
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
R4 R5
a e d
h k i
l
j
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Definition of kNNTA
Structure and Usage of TAR-tree
Maintenance of TAR-tree
Insert Visits or Likes
Insert location
Re-Insert
Note split
R6 R7
R1 R2 R3
f
c g b
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
R4 R5
a e d
h k i
l
j
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
The Importance of Entry Grouping Strategy
R6
R2
R7
Root, R6 , R7 , R2
Four node accesses
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
The Importance of Entry Grouping Strategy
R6
R2
R7
R3
R6
Root, R6 , R7 , R2
Root, R6 , R3
Four node accesses
Three node accesses
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
Properly Integrating the Spatial Distance and
Temporal Aggregate
Straightforward one: Use the spatial information
Straightforward two: Use the aggregate distribution
(1, 0, 1) with (1, 1, 0)
(1, 0, 1) not with (10, 8, 9)
Propose: 3D MBR in R*-tree
two spatial dimensions
third is the aggregate dimension
Coordinate of the aggregate dimension
m
X
bp = 1
vi
λ
m
i=1
Example: (1, 0, 1) → 0.67 and (10, 2, 9, 8) → 7.25
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
Power-law Distribution of the Aggregate Data
Roughly 80% of the visits are at 20% of the locations
0
0
10
10
-1
-1
10
Pr(X ≥ x)
Pr(X ≥ x)
10
-2
10
-3
10
-4
-2
10
-3
10
-4
10
10
-5
-5
10
0
1
10
2
10
10
3
10
10
0
1
10
2
10
3
10
x
10
x
(a) NYC
(b) LA
0
10
-1
Pr(X ≥ x)
Pr(X ≥ x)
10
-3
10
-5
10
-7
10
-2
10
-4
10
-6
0
10
1
10
2
10
3
10
4
10
x
(c) GW
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
10
0
10
1
10
2
3
10
10
4
10
5
10
x
(d) GS
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
aggregate dimension
Estimation of the Query Search Region and Number
of Node Accesses
l
a
d e
b
k
j
h
c
g
g′
i
f
0.83
0.50
0.00
Conic shape search region
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
l
a
d e
b
k
j
h
c
g
g′
i
f
0.83
0.50
0.00
Conic shape search region
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
aggregate dimension
aggregate dimension
Estimation of the Query Search Region and Number
of Node Accesses
nodes
search region
Power-law like node extents
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
Bad Behavior of the Two Straightforward Strategies
a
b
d
e
j
l
h
c
g
k
i
f
Spatial grouping
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
The Proposed Strategy
Analysis of Different Strategies
Bad Behavior of the Two Straightforward Strategies
a
b
d
a
e
j
l
h
k
b
j
l
d
e
k
h
c
c
g
i
f
Spatial grouping
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
i
g
f
Aggregate grouping
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Experiments
Conclusion
Experiment Set Up
Table: Data Set
Name
Time
Locations
Check-ins
NYC
LA
GW
GS
05/2008-06/2011
02/2009-07/2011
02/2009-10/2010
01/2011-07/2011
72,626
45,591
1,280,969
182,968
237,784
127,924
6,442,803
1,385,223
Temporal index: Multi-version B-tree
Desktop with 3.40GHz CPU and 16GB RAM
Results are averaged over 1, 000 queries
By default k = 10 and α = 0.3.
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Experiments
Conclusion
α0
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
1000
node accesses
0.8
measured
estimated
0.7
0.6
0.5
0.4
0.3
0.2
0.1 1
5
10
50
100
k
0.7
measured
estimated
0.6
0.5
0.4
0.3
0.2
0.1
0 0.1 0.3 0.5 0.7 0.9
measured
estimated
800
600
400
200
0 1
300
node accesses
f(pk)
f(pk)
Validation of The Analysis
5
10
k
50
100
0.7
0.9
measured
estimated
200
100
0 0.1
0.3
0.5
α0
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Experiments
Conclusion
Performance of the TAR-tree
baseline
IND-agg
IND-spa
TAR-tree
1000
1000
100
10
1
20%
1
0.5
α0
0.7
0.9
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
5
10
k
50
100
baseline
IND-agg
IND-spa
TAR-tree
1000
10
0.3
1
10000
100
0.1
10
1
CPU time (ms)
1000
100
60% 80% 100%
time
baseline
IND-agg
IND-spa
TAR-tree
10000
CPU time (ms)
40%
baseline
IND-agg
IND-spa
TAR-tree
10000
CPU time (ms)
CPU time (ms)
10000
100
10
1
1
3
7
14
28
epoch length (day)
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Experiments
Conclusion
Conclusion
The kNNTA query can provide highly customized
location retrieval and has wide applications.
The TAR-tree index efficiently processes the kNNTA
query.
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries
Motivation and Related Work
Query kNNTA and Index TAR-tree
Entry Grouping Strategies
Experiments and Conclusion
Experiments
Conclusion
Questions?
Yu Sun, Jianzhong Qi, Yu Zheng and Rui Zhang
K-Nearest Neighbor Temporal Aggregate Queries