Answering Top-k Queries
Using Views
Gautam Das (Univ. of Texas),
Dimitrios Gunopulos (Univ. of California Riverside),
Nick Koudas (Univ. of Toronto),
Dimitris Tsirogiannis (Univ. of Toronto)
Introduction
R
tid
X1
X2
X3
tid
Score
1
82
1
59
2
612
2
53
19
83
1
543
3
29
99
15
4
370
4
80
45
8
3
360
5
28
32
39
5
343
fQ
Preferences expressed as scoring functions on
the attributes of a relation, e.g
fQ 3X1 2X 2 5X 3
Top-k: k tuples with the highest score
VLDB '06
Related Work
TA [Fagin et. al. ‘96]
PREFER [Hristidis et. al. ‘01]
Deterministic stopping condition
Always the correct top-k set
Stores multiple copies of base relation R
Utilizes only one
We complement existing approaches
VLDB '06
Motivation
Query answering using views
Space-Performance tradeoff
Improved efficiency
Can we exploit the same tradeoffs for
top-k query answering?
VLDB '06
Problem Statement
Ranking Views: Materialized results of previously
asked top-k queries
Problem: Can we answer new ad-hoc top-k queries
efficiently using ranking views?
fQ 3X1 2X 2 5X 3 fV1 2X1 5X2 fV 2 X2 4X3
R
tid
X1
X2
X3
1
82
1
59
2
53
19
83
3
29
99
15
4
80
45
8
5
28
32
39
V1
tid
Score
3
4
V2
tid
Score
553
2
351
385
237
5
216
1
5
177
2
201
3
159
1
169
4
88
VLDB '06
Outline
LPTA Algorithm
View Selection Problem
Cost Estimation Framework
View Selection Algorithms
Experimental Evaluation
Conclusions
VLDB '06
LPTA - Setting
Linear additive scoring functions e.g.
fQ 3X1 2X 2 5X 3
Set of Views:
Materialized result of a previously executed
top-k query
Arbitrary subset of attributes
Sorted access on pairs tid,scoreQ tid
Random access on the base table R
VLDB '06
LPTA - Example
Top-1
R(X1, X2)
V1
1
1
tid12
1
tid3
1
tid
s12
tid s
4
T (0,1)
V2
1
1
2
1
4
2
4
tid s
2
4
2
5
2
5
1
1
tid s
Q
R (1,1)
tid
tid s
1
3
1 1
tid5 s5
stopping
condition
2
1
tid 22 s22
2
tid 3 s32
s
1
s
V1
X1
2
1
tid
O (0,0)
VLDB '06
P (1,0)
V2
X2
LPTA
Linear Programming adaptation of TA
R(X1, X 2 )
fV1 2X1 5X2 fV 2 X1 2X2
V1
s1d
max( f Q )
V2
tid Score tid Score
tid1d
Q: fQ 3X1 10X 2
tid d2
sd2
0 X1, X 2 1
2X1 5X 2 s1d
d iteration
X 2 2X 2 sd2
unseen max topkmin
VLDB '06
LPTA - Example (cont’)
R(X1, X2)
V1
tid11 s11
tid12 s12
1
2
1
3
1
2
1
3
tid 2 s2
tid 32 s32
4
tid 42 s42
tid s
tid s
tid1
s1
4
1 1
tid5 s5
X1
Top-1
V2
V1
T (0,1)
stopping
condition
1
1
Q
R (1,1)
tid
tid12
2 2
tid 22
tid12
V2
tid 52 s52
O (0,0)
VLDB '06
P (1,0)
X2
Outline
LPTA Algorithm
View Selection Problem
Cost Estimation Framework
View Selection Algorithms
Experimental Evaluation
Conclusions
VLDB '06
View Selection Problem
Given a collection of views V {V1, ,Vr}
and a query Q, determine the most
efficient subset U V to execute Q on.
Conceptual discussion
Two dimensions
Higher
dimensions
VLDB '06
View Selection - 2d
Q
Y
T (0,1)
A1
V1
Min top-k tuple
A
R (1,1)
M
V2
B
O (0,0)
B1
VLDB '06
P (1,0)
X
View Selection - Higher d
Theorem: If V {V1, ,Vr} is a set of views
for an m-dimensional dataset and Q a
query, the optimal execution of LPTA
requires
a subset of views U V such
that U m.
Question: How do we
select the optimal
subset of views?
VLDB '06
Outline
LPTA Algorithm
View Selection Problem
Cost Estimation Framework
View Selection Algorithms
Experimental Evaluation
Conclusions
VLDB '06
Cost Estimation Framework
What is the cost of running LPTA when a
specific set of views is used to answer a
query?
Cost = number of sequential accesses
V1
Min top-k tuple
Q
Cost = 6 sequential
A
B
V2 accesses
Can we find that cost
without actually running
LPTA?
VLDB '06
Simulation of LPTA on
Histograms
HQ: approximates the score
distribution of the query Q
HQ
HV1 HV2
Cost
1.
topkmin
2.
Use HQ to estimate the
score of the k highest
tuple (topkmin).
Simulate LPTA in a
bucket by bucket lock
step to estimate the
cost.
b buckets
n/b tuples per bucket
VLDB '06
Outline
LPTA Algorithm
View Selection Problem
Cost Estimation Framework
View Selection Algorithms
Experimental Evaluation
Conclusions
VLDB '06
View Selection Algorithms
Exhaustive (E): Check all possible
r
p
m
subsets of size
, p .
Greedy (SV): Keep expanding the set of
views to use until the estimated cost
stops reducing.
VLDB '06
Select Views Spherical (SVS)
Requires the solution of a single linear
program.
(0,1)
max( f Q )
fV j s
s
Q
T
s s
s
(0,0)
s
Selected Views
(1,0)
VLDB '06
Select Views By Angle (SVA)
Select Views By Angle (SVA): Sort the views by
increasing angle with respect to Q.
(0,1)
V4
V3
4
3
V2
2
1
(0,0)
Q
Selected Views
V1
1 2 3 4
(1,0)
VLDB '06
General Queries and Views
Views that materialize their top-k tuples.
Truncate the view histograms.
Accommodating range conditions
Select the views that cover the range
conditions.
Truncate each attribute’s histogram.
VLDB '06
Outline
LPTA Algorithm
View Selection Problem
Cost Estimation Framework
View Selection Algorithms
Experimental Evaluation
Conclusions
VLDB '06
Experiments
Datasets (Uniform, Zipf, Real)
Experiments:
Performance comparison of LPTA,
PREFER and TA
Accuracy of the cost estimation framework
Performance of LPTA using each of the
view selection algorithms
Scalability of the LPTA algorithm
VLDB '06
Performance comparison of
LPTA, PREFER and TA
Real dataset, 2d
Uniform dataset, 3d
VLDB '06
Cost Estimation Accuracy
2d
(buckets = 0.5% of n)
(buckets = 1% of n)
VLDB '06
Performance of LPTA using
View Selection Algorithms
(2d)
500K tuples, top-100 (3d)
VLDB '06
Scalability Experiments on
LPTA
(2d, uniform dataset)
(500K tuples, top-100)
VLDB '06
Conclusions
Using views for top-k query answering
LPTA: linear programming adaptation of
TA
View selection problem, cost estimation
framework, view selection algorithms
Experimental evaluation
VLDB '06
(Thank You!)
Questions?
VLDB '06
© Copyright 2026 Paperzz