slides - Abolfazl Asudeh

Efficient Computation of
Regret-ratio Minimizing Set:
A Compact Maxima Representative
ABOLFAZL ASUDEH
U N I V ERSITY OF T E X AS AT A R L I N GTON
AZADE NAZI
U N I V ERSITY OF T E X AS AT A R L I N GTON
NAN ZHANG
GEORGE WASHINGTON UNIVERSITY
GAUTAM DAS
SIGMOD’17 © 2017 ACM. ISBN 978-1-4503-4197-4/17/05
UNIVERSITY OF TEXAS AT ARLINGTON
Outline
Motivation and Problem statement
2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set)
HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set)
Experiments
2
Maxima Queries
… to give the best trade-off b/w
price, duration, number of stops, …
𝑓 = βˆ‘π‘€π‘– 𝐴𝑖
3
Y
1
Example
0.9
𝑑𝑖
οƒΌ
0.8
0.7
𝑓 =π‘₯+𝑦
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X
4
1
Convex hull (sky convex)
Y
Example
οƒΌ
0.9
οƒΌ
0.8
0.7
0.6
0.5
0.4
0.3
0.2
οƒΌ
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X
5
οƒΌ

Y
Example
1
0.9
οƒΌ

0.8
0.7
A subset of skyline:
the set of non-dominated points
0.6

0.5
0.4
0.3
0.2
οƒΌ
0.1

0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X
6
Example
Convex hull (sky convex)
Y
1
0.9
οƒΌ
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X
7
Convex hull size Problem
Curvature effect
8
Convex hull size Problem
effect of the number of attributes (m)
m=6
m=5
m=3
m=2
m=4
Regret-Ratio Minimizing Set
Problem:
Find a subset of size at most r
that minimizes the maximum
Regret-ratio over all functions
𝑓 𝑑 βˆ’ 𝑓(𝑑 β€² )
𝑓 𝑑 βˆ’ 𝑓(𝑑 β€² )
𝑓(𝑑)
οƒΌ
10
Overview of the literature,
Our contributions
The regret-ratio notion and the problem was first proposed at [Nanongkai et. al. VLDB 2010].
In two dimensional data:
β—¦ [Chester et. al. VLDB 2014]: Sweeping line 𝑂(π‘Ÿ. 𝑛2 )
β—¦ We: a dynamic algorithm O r. s. log s . log c < O r. n. (log n)2 -- s: skyline size; c: convex hull size.
In higher dimensional data:
β—¦ Complexity: NP-complete
β—¦ For arbitrary dimensions: [Chester et. al. VLDB 2014]
β—¦ Recently for fixed dimensions: [W. Cao et. al. ICDT 2017], [P. K. Agrawal et. al. Arxiv:1702.01446, 2017]
β—¦ Existing work: (a) a greedy heuristic with unproven theoretical guarantee, (b) a simple attribute
space discretization with a fixed upper bound on the regret-ratio of output [Nanongkai et. al. VLDB
2010].
β—¦ We: a linearithmic time approximation algorithm that guarantees a regret ratio, within any
arbitrarily small user-controllable distance from the optimal regret ratio.
β—¦ Assumption: fixed number of dimensions
11
Outline
Motivation and Problem statement
2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set)
HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set)
Experiments
12
High-level idea
t0 t1
t2

Order the skyline points from top-left to bottom right, add two
dummy points t0 and ts+1, and construct a complete
weighted graph on these points
t3

t4

Weight of an edge is the Max. regret ratio of removing all the
points in its top-right half-space
t5

t6
t7
13
t0
High-level idea
t1
t2
Order the skyline points from top-left to bottom right, add two
dummy points t0 and ts+1, and construct a complete
weighted graph on these points
t3
Weight of an edge is the Max. regret ratio of removing all the
points in its top-right half-space οƒ  use binary search
t4
t5
t6
t7
14
High-level idea
t0 t1
t2

Order the skyline points from top-left to bottom right, add two
dummy points t0 and ts+1, and construct a complete
weighted graph on these points
t3

t4

Weight of an edge is the Max. regret ratio of removing all the
points in its top-right half-space οƒ  use binary search
Apply the Dynamic programming, DP(ti,r’): optimal solution
from ti to ts+1 with at most r’ intermediate steps
t5

𝑂(π‘Ÿ. 𝑠. log 𝑠 log 𝑐)
t6
t7
15
Outline
Motivation and Problem statement
2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set)
HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set)
Experiments
16
Steps
β€’ Start with a conceptual model
RRMS β€’ Discuss its problems
β€’ Propose the idea of function space discretization
DMM β€’ Transform RRMS to a Min Max problem
β€’ Define the intermediate problem β€œMin Rows Satisfying a Threshold”
MRST β€’ Transform MRST to a fixed-size instance of Set-cover problem
17
Conceptual Model
f
Transform the problem to a min-max problem
Regret-ratio on 𝑓 if
only
𝑑2 is remained
Problem1:
β—¦ F is continuous οƒ  infinite number of
columns
β—¦ Matrix Discritization
...
𝑑1
𝑑2
F
(all possible functions)
Problem2:
𝑑𝑠
Max ( Min
)
𝑛
β—¦ Even if could construct the matrix,
π‘Ÿ
to solve it
β—¦ Transform to fixed-size set-cover
instances
18
Matrix Discretization
πœƒ2
f
Arbitrarily small user-controllable
distance from the optimal solution
πœƒ1
19
DMM: Discretized Min Max Problem
FF(discretized
function space)
(all
possible
functions)
(discretized function
space)
F
set-cover
Order
theinstances
values in M.
f
f
1. Accept a result if its size is at most π‘Ÿπ‘šπ‘™π‘œπ‘”(𝛾): Index size increase, no
in quality
of output
Do change
a binary
search
over the values and for each value
2. Accept the result if size is at most r: index size does not change,
Define
an intermediate
problem:
output
quality may increase.
1 if regret-ratio of t for f is at
most threshold, 0 otherwise
...
𝑑𝑑𝑑𝑖1
2
Observation:
the optimal
regret-ratio
is one
of thefor
cellsolving
values!
Practical HD-RRMS:
Use greedy
approximate
algorithm
the
β—¦ Min. rows satisfying the threshold (MRST)
Convert M to a (fixed-size) binary matrix
Convert MRST to a (fixed size) set-cover instance
𝑑𝑠
Max ( Min
)
For fixed values of π‘š and 𝛾, can be solved in constant time.
οƒ  The running time of HD-RRMS is 𝑂(𝑛 log 𝑛)
20
Outline
Motivation and Problem statement
2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set)
HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set)
Experiments
21
Setup
Synthetic Data:
β—¦ Three datasets (correlated, independent, and anti-correlated) 10M tuples over 10 ordinal
attributes.
Real-world Datasets
β—¦ Airline dataset: 5.8M records over two ordinal attributes.
β—¦ US Department of Transportation (DOT) dataset: 457K records over 7 ordinal attributes.
β—¦ NBA dataset: 21K tuples over 17 ordinal attributes.
22
2D-RRMS
NBA dataset
Airline dataset
23
HD-RRMS
DOT dataset
NBA dataset
24
Thank You!
25