The Optimal-Location Query

Progressive Computation
of The Min-Dist
Optimal-Location Query
Donghui Zhang,
Yang Du, Tian Xia, Yufei Tao*
Northeastern University
* Chinese University of Hong Kong
VLDB’06, Seoul, Korea
Motivation
• “What is the optimal location in
Boston area to build a new
McDonald’s store?”
• Suppose a customer drives to the
closest McDonald’s.
• Optimality: Minimize AVG driving
distance.
Donghui Zhang et al.
Optimal Location Query
2
min-dist OL
200
600
200
600
• Without any new site:
AD = (200+200+600+600)/4 = 400.
Donghui Zhang et al.
Optimal Location Query
3
min-dist OL
30
30
l1
600
600
• Without any new site:
AD = (200+200+600+600)/4 = 400.
• With new site l1:
AD(l1) = (30+30+600+600)/4 = 315.
Donghui Zhang et al.
Optimal Location Query
4
min-dist OL
200
30
30
200
• Without any new site:
AD = (200+200+600+600)/4 = 400.
• With new site l1:
AD(l1) = (30+30+600+600)/4 = 315.
• With new site l2 :
AD(l2) = (200+200+30+30)/4 = 115.
Donghui Zhang et al.
Optimal Location Query
l2
5
Formal Definition
• Given a set S of sites, a set O of objects,
and a query range Q ,
• min-dist OL is a location l  Q which
minimizes
1
AD(l ) 
dNN (o, S  {l})

oO
|O|
distance between o and its nearest site
Donghui Zhang et al.
Optimal Location Query
6
L1 Distance
• d(o, s) = |o.x – s.x|+|o.y – s.y|
Donghui Zhang et al.
Optimal Location Query
7
Challenging
1. There are infinite number of
locations in Q. How to produce a
finite set of candidates (yet keeping
optimality)?
2. How to avoid computing AD(l) for
all candidates?
Donghui Zhang et al.
Optimal Location Query
8
Solution Highlights
1. Algorithm to compute AD(l).
2. Theorems to limit #candidates.
3. Lower-bound of AD(l) for all
locations l in a cell C.
4. Progressive algorithm.
Donghui Zhang et al.
Optimal Location Query
9
1. Compute AD(l)
1
• Remember AD(l )  | O | oO dNN (o, S  {l})
1
• Define
AD 
dNN (o, S )

oO
|O|
• Let RNN(l) be the objects “attracted” by l.
• AD(l)=AD if RNN(l)=
l
RNN(l)=
AD=AD(l)
Donghui Zhang et al.
Optimal Location Query
10
1. Compute AD(l)
1
• Remember AD(l )  | O | oO dNN (o, S  {l})
1
• Define
AD 
dNN (o, S )

oO
|O|
• Let RNN(l) be the objects “attracted” by l.
• AD(l)=AD if RNN(l)=
l
RNN(l)={o7, o8}
AD(l) < AD
Donghui Zhang et al.
Optimal Location Query
11
1. Compute AD(l)
1
• Remember AD(l )  | O | oO dNN (o, S  {l})
1
• Define
AD 
dNN (o, S )

oO
|O|
• Let RNN(l) be the objects “attracted” by l.
• AD(l)=AD if RNN(l)=
• AD(l)=AD - ?
Average savings for customers in RNN(l)
Donghui Zhang et al.
Optimal Location Query
12
1. Compute AD(l)
• Theorem
1
AD(l )  AD 
(dNN (o, S )  d (o, l ))

oRNN ( l )
|O|
• S and O are “static” versus l.
– AD can be pre-computed.
– So is dNN(o, S)
• To compute AD(l):
– Find RNN(l)
– oRNN(l), compute d(o, l)
Donghui Zhang et al.
Optimal Location Query
13
2. Limit #candidates
• Theorem: within the X/Y range of
Q, draw grid lines crossing objects.
Only need to consider intersections!
Q
Donghui Zhang et al.
Optimal Location Query
14
2. Limit #candidates
• Theorem: within the X/Y range of
Q, draw grid lines crossing objects.
Only need to consider intersections!
Q
Donghui Zhang et al.
Optimal Location Query
5x6=30 candidates
15
2. Limit #candidates
• Proof idea: suppose the OL is not,
move it will produce a better (or
equal) result.
δ
l
• Consider RNN(l).
• Move to the right  saves total dist.
Donghui Zhang et al.
Optimal Location Query
16
2. VCU(Q)
• A spatial region, enclosing the
objects closer to Q than to sites in
S.
• It’s the Voronoi cell of Q versus
sites in S.
Donghui Zhang et al.
Optimal Location Query
17
2. Further Limit #candidates
• Only consider objects in VCU(Q).
5x6=30 candidates
Donghui Zhang et al.
Optimal Location Query
18
2. Further Limit #candidates
• Only consider objects in VCU(Q).
5x6=30 candidates
Donghui Zhang et al.
Optimal Location Query
19
2. Further Limit #candidates
• Only consider objects in VCU(Q).
4x4=16 candidates
Donghui Zhang et al.
Optimal Location Query
20
Naïve Algorithm
• Derive candidates.
• Compute AD(l) for each.
• Pick smallest.
• Not efficient! Too many candidates!
To compute AD(l) for each one, need:
• compute RNN(l)
• retrieve all these objects…
Donghui Zhang et al.
Optimal Location Query
21
Progressive Idea
• Treat Q as a cell and consider its
corners.
Donghui Zhang et al.
Optimal Location Query
22
Progressive Idea
• Divide the cell.
Donghui Zhang et al.
Optimal Location Query
23
Progressive Idea
• Divide the cell.
Donghui Zhang et al.
Optimal Location Query
24
Progressive Idea
• Recursively divide a sub-cell.
Donghui Zhang et al.
Optimal Location Query
25
Progressive Idea
• Recursively divide a sub-cell.
• Able to check all candidates.
Donghui Zhang et al.
Optimal Location Query
26
Progressive Idea
• Q: What do you save?
• A: Cell pruning, if its lower bound 
AD(l0) of some candidate l0.
AD(lo )
=50
Suppose 60 is a lower bound for AD(l), l C
Donghui Zhang et al.
Optimal Location Query
27
3. LB(C): lower bound for
AD(l), lC
AD(c1)=1000
AD(c2)=3000
c
AD(c3)=4000
Donghui Zhang et al.
AD(c4)=2500
Optimal Location Query
28
3. LB(C): lower bound for
AD(l), lC
AD(c1)=1000
AD(c2)=3000
c
AD(c3)=4000
• Theorem: max{
AD(c4)=2500
AD(c1 )  AD(c4 ) AD(c2 )  AD(c3 )
p
,
}
2
2
4
is a lower bound, where p is perimeter.
• e.g. LB(C)=3500-p/4
Donghui Zhang et al.
Optimal Location Query
29
3. LB(C): lower bound for
AD(l), lC
• A better lower bound Theorem:
AD(c1 )  AD(c4 ) AD(c2 )  AD(c3 )
p | VCU (C ) |
max{
,
} *
2
2
4
|O|
• Comparing with the previous lower bound:
• Higher quality since the lower bound is larger.
• More computation.
Donghui Zhang et al.
Optimal Location Query
30
4. The Progressive Algorithm
1. Maintain a heap of cells ordered by LB().
Initially one cell: Q.
2. Maintain the best candidate lopt
3. Pick the cell with minimum LB() and
partition it.
4. Compute AD() for the corners of subcells.
5. Compute LB() for the sub-cells.
6. Insert sub-cell ci to heap if LB(ci)<AD(lopt)
7. Goto 3.
Donghui Zhang et al.
Optimal Location Query
31
Progressiveness
•
The algorithm quickly reports a
candidate OL with a confidence interval,
and keeps refining.
AD(best corner of Q)
AD( real OL ) is inside the interval
LB(Q)
Time
Donghui Zhang et al.
Optimal Location Query
32
Progressiveness
•
The algorithm quickly reports a
candidate OL with a confidence interval,
and keeps refining.
AD(best candidate)
AD( real OL ) is inside the interval
LB(Q)
Time
Donghui Zhang et al.
Optimal Location Query
33
Progressiveness
•
The algorithm quickly reports a
candidate OL with a confidence interval,
and keeps refining.
AD(best candidate)
AD( real OL ) is inside the interval
Min{ LB(C) | C in heap }
•
Time
User may choose to terminate any time.
Donghui Zhang et al.
Optimal Location Query
34
Batch Partitioning
•
•
•
To partition a cell, should partition into
multiple sub-cells.
Reason: to compute AD(l), need to
access the R*-tree of objects. When
access the R*-tree, want to compute
multiple AD(l).
Tradeoff: if partition too much: wasteful!
Since some candidates could be pruned.
Donghui Zhang et al.
Optimal Location Query
35
Performance Setup
• O: 123,593 postal addresses in
Northeastern part of US. Stored
using an R*-tree.
• S: randomly select 100 sites from O.
• Buffer: 128 pages.
• Dell Pentium IV 3.2GHz.
• Query size: 1% in each dimension.
Donghui Zhang et al.
Optimal Location Query
36
2. Further Limit #candidates
• Only consider objects in VCU(Q).
4x4=16 candidates
Donghui Zhang et al.
Optimal Location Query
37
Effect of VCU Computation
Donghui Zhang et al.
Optimal Location Query
38
3. LB(C): lower bound for
AD(l), lC
AD(c1)=1000
AD(c2)=3000
c
AD(c3)=4000
• Theorem: max{
AD(c4)=2500
AD(c1 )  AD(c4 ) AD(c2 )  AD(c3 )
p
,
}
2
2
4
is a lower bound, where p is perimeter.
• e.g. LB(C)=3500-p/4
Donghui Zhang et al.
Optimal Location Query
39
3. LB(C): lower bound for
AD(l), lC
• A better lower bound Theorem:
AD(c1 )  AD(c4 ) AD(c2 )  AD(c3 )
p | VCU (C ) |
max{
,
} *
2
2
4
|O|
• Comparing with the previous lower bound:
• Higher quality since the lower bound is larger.
• More computation.
Donghui Zhang et al.
Optimal Location Query
40
Comparison of Lower Bounds
Donghui Zhang et al.
Optimal Location Query
41
Effect of Batch Partitioning
Donghui Zhang et al.
Optimal Location Query
42
Progressiveness
•
The algorithm quickly reports a
candidate OL with a confidence interval,
and keeps refining.
AD(best candidate)
AD( real OL ) is inside the interval
Min{ LB(C) | C in heap }
•
Time
User may choose to terminate any time.
Donghui Zhang et al.
Optimal Location Query
43
Progressiveness
•Each step: partition a cell to 40 sub-cells.
•After 200 steps, accurate answer.
•After 20 steps, answer is 1% away from optimal.
Donghui Zhang et al.
Optimal Location Query
44
Conclusions
• Introduced the min-dist optimallocation query.
• Proved theorems to limit the number
of candidates.
• Presented lower-bound estimators.
• Proposed a progressive algorithm.
Donghui Zhang et al.
Optimal Location Query
45