Tuning the top-k view update process

View Usability and Safety for the
Answering of Top-k Queries via
Materialized Views
Eftychia Baikousi
Panos Vassiliadis
University of Ioannina
Dept. of Computer Science
Forecast

Problem of answering a top-k query
through materialized top-n views



Theoretical guarantees when a top-n materialized
view can answer a top-k query
Algorithmic techniques for answering a top-k
query from a materialized view
Properties of the safe areas of views
DOLAP 2009, Hong Kong, 6 Nov 2009
2
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
3
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
4
Top-k query
Given


a relation R (id, x1, x2, x3) and
a query Q, sum(x1, x2, x3)
Find k tuples with highest grades according to Q
R
id
x1
x2
x3
sum
a
0.3
0.6 0.7
1.6
b
0.2
0.3 0.4
0.9
c
0.4
0.5 0.9
1.8
d
0.7
0.6 0.1
1.4
Top-2 tuples
DOLAP 2009, Hong Kong, 6 Nov 2009
5
Motivating Example
Telecommunication Company


Given a relation


Region

Executives see sale reports in PDAs
Region (id, name, today_traffic, yesterday_traffic, budget, ..)
a materialized view V of top-2 regions according to the query
Q: 0.6*difftraffic + 0.4*budget
id
Name
t_traffic y_traffic
budget
V
1
LA
18
20
21
2
NY
42
54
3
Dallas
26
4
Chicago
30
V
name
V
7.2
LA
7.2
15
-1.2
Dallas
4.4
22
8
4.4
28
11
5.6
Can a new top-k query (e.g. 0.5*difftraffic + 0.3*budget)
be answered from V ?
DOLAP 2009, Hong Kong, 6 Nov 2009
6
Problem definition

Given
 a base relation R (ID, X, Y)
 a materialized view V (ID, X, Y, s)
that contains top-n tuples of the form (id, s) where s is defined as
s = w (a·x + y) and w, a are positive parameters

a query Q (ID, X, Y, sQ )
that requests for top k ≤ n tuples of the form (id, sQ) where sQ is defined as
sQ = wQ (aQ·x + y) and wQ, aQ are positive parameters

Introduce
 an algorithm
that decides whether V by itself is suitable to answer Q
and compute Q’s answer
DOLAP 2009, Hong Kong, 6 Nov 2009
7
Related Work
Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis :
“Answering Top-k Queries Using Views”, VLDB ’06

Answer top-k query Q by making use of ranking views V

LPTA in 2-steps
 SelectViews (V, Q)

Selects efficient subset of views U for answering Q,
U contains the sorted lists over each attribute of the relation
Answer Q from U




Linear programming adaptation of TA algorithm
Stopping condition : solution of linear program ≤ min (top-k)
DOLAP 2009, Hong Kong, 6 Nov 2009
8
Related Work –
Geometric Representation (0)

Assume



Relation R (ID, X, Y)
Two views Vu( id, Score1)
and Vd( id, Score2)
Query Q( id, Score)

Scoring functions of the
form Score = w ( a·x +y)

Depicted as y = a-1·x
DOLAP 2009, Hong Kong, 6 Nov 2009
9
Related Work –
Geometric Representation (1)

M : the kth tuple in Q

Stopping condition:
sweeping line (
)
crosses position A1B

Any point below line
AB has smaller
score than M in
regards to Q
DOLAP 2009, Hong Kong, 6 Nov 2009
10
Related Work –
Geometric Representation (2)

Stopping condition:
intersection point S of
sweeping lines ( ,
)
lies on line AB

Any point below line AB
has smaller score than
M in regards to Q
DOLAP 2009, Hong Kong, 6 Nov 2009
11
Related Work

SelectViews (V,Q) is Data dependant
 based on estimation of the last tuple of Q
according to the data distribution

No theoretically established guarantees that the
set of views will answer Q
DOLAP 2009, Hong Kong, 6 Nov 2009
12
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
13
Overview of the method
1.
Theoretical guarantees of Answering a query Q
via a view VU
2.
Theoretical guarantees are too strict
3.
Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009
14
Example


V top-3 with score x+2y
Q top-1 with score 2x+y
R
DOLAP 2009, Hong Kong, 6 Nov 2009
id
x
y
V
Q
a
7
4
15
18
b
2
7
16
11
c
4
2
8
10
d
1
1
3
3
15
Construction of safe area

VU(ID, X, Y, sU)





Containing top n tuples
with score sU=wU(aU·x+y)
tN the nth tuple in VU
LU :xNUyNU line
perpendicular to VU
passing from tN and
meeting axes X and Y
LQ:xNUyQ line
perpendicular to Q
passing from xNU
DOLAP 2009, Hong Kong, 6 Nov 2009
16
Safe area


Safe area defined as the
area “above” line LQ
(shaded area)
Observations


Any tuple in safe area has
score (in regards to Q)
higher than any tuple
outside the safe area
Tuples in safe area belong
in both VU and Q
DOLAP 2009, Hong Kong, 6 Nov 2009
17
Answering Q from VU

THEOREM 1
VU can answer Q if
safe area contains at
least k tuples

Inverse does not
always hold
DOLAP 2009, Hong Kong, 6 Nov 2009
18
Overview of the method
1.
Theoretical guarantees of Answering a query Q via
a view VU
2.
Theoretical guarantees are too strict
3.
Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009
19
Answering Q from VU cont.

THEOREM 2
It is possible that VU
can answer Q if safe
area contains less
than k tuples

This holds when:
area defined by (yellow
triangle)
 line LU, X-axis and
 line L1 producing the
lowest possible score for
Q from tuples of VU
Is void of tuples
DOLAP 2009, Hong Kong, 6 Nov 2009
20
Algorithm TestViewSuitability

Three main steps

Step 1:
Compute safe area (Q, V)

Step 2:
Count tuples in V that belong in the safe area

Step 3:
If there are more than k, then return (true)
Else return (false)
DOLAP 2009, Hong Kong, 6 Nov 2009
21
Overview of the method
1.
Theoretical guarantees of Answering a query Q via
a view VU
2.
Theoretical guarantees are too strict
3.
Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009
22
Combining two views




Lines LQU , LQD  Q
characterizing the safe
areas for VU and VD
LQU ║ LQD
safe area of one view
(VU ) encompassed in
safe area of the other
view (VD)
DOLAP 2009, Hong Kong, 6 Nov 2009
23
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
25
Experimental methodology


Test the following methods

Our algorithm

TA algorithm (it can guarantee view usability correctness)
For the following goals

Effectiveness


Number of queries answered by views
Efficiency

Time savings from usage of queries
DOLAP 2009, Hong Kong, 6 Nov 2009
26
Experimental methodology


Experimental parameters:
Size of source table R (tuples)
|R|
1x104, 5x104, 1x105
Max size of mat. View (tuples)
k
10, 50, 100, 500, 1000
Number of queries asked
|Q|
100, 1000
Synthetic data sets:


Random data sets of different sizes for a relation of the form
R (ID, X, Y)
Sequence of queries with random coefficients and result size k
DOLAP 2009, Hong Kong, 6 Nov 2009
27
Effectiveness

Percentage of views used for 100 queries
DOLAP 2009, Hong Kong, 6 Nov 2009
28
Effectiveness

Percentage of views used for different time spans
DOLAP 2009, Hong Kong, 6 Nov 2009
29
Efficiency

Time savings from the usage of queries for different database sizes
and requested results
 Conflicting case
 The number of stored
results rises, while the
savings drop
 Due to the size of used
memory



DOLAP 2009, Hong Kong, 6 Nov 2009
Memory allocation
becomes slow
Probably one view is
able to answer lot of
queries
Savings increase for
reasonable k’s of size
0.1%
30
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
31
Conclusions


We have provided theoretical and algorithmic
results for the problem of answering top-k
queries via materialized views
Theoretical – algorithmic results:
 Theorem1: Theoretical guarantees for a view to
answer a top-k query,


Theorem2: Strictness of Theorem1
Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009
32
Contents

Motivation & Problem Definition

Overview of the Method

Theoretical guarantees

Strictness of theorem

Safe area properties

Experiments

Conclusions

Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009
33
Future Work

Optimization in case of time and storage
constraints

View Caching

Hierarchical structures for the set of views

Sorting techniques
DOLAP 2009, Hong Kong, 6 Nov 2009
34
Thank you for your attention!
… many thanks to our hosts!
DOLAP 2009, Hong Kong, 6 Nov 2009
35
Auxiliary
Time Savings
DOLAP 2009, Hong Kong, 6 Nov 2009
36