Skyline Query Processing for Incomplete Data

Skyline Query Processing for
Incomplete Data
Mohamed E. Khalefa Mohamed F. Mokbel Jus
tin J. Levandoski
Department of Computer Science and Engineering, University of
Minnesota, Minneapolis, MN, USA
ICDE 2008
1
Outline
 Introduction
 Problem Formulation
 Methods and Algorithms
 Experiment Results
 Conclusion
2
Introduction

Existing skyline algorithms assume:
1. Date are complete (all dimensions are available for all data )
2.Transitive relation.
p1
p2
p3
(2,3,6)
(1,2,4)
(1,1,1)
p1 dominates p2, p2 dominates p3 => p3 dominates p1.
3
(Cont.)

If data is incomplete:
1.Some dimensions are no value.
2.No transitive relation.
p1
p2
p3
(2,3,-)
(1,-,8)
(-,4,2)
p1 dominates p2, p2 dominates p3.
But p1 don’t dominates p3.
p3 dominates p1.
Cycle and no transitive relation!!
4
Problem Formulation
 Dominance Relation for Incomplete data:
1.There is at least one dimension ui where both P.ui and Q.ui are
known, and P.ui > Q.ui .
2.For all other dimensions j, j ≠i, either P.uj is unknown, Q.uj is
unknown, or P.uj ≥Q.uj .
 Example:
p1
(2,3,-)
p2
p3
(1,-,8)
(-,4,-)
p1 dominates p2.
p2 don’t domninate p3, and p3 don’t domninate p2.
5
(Cont.)
 Bitmap representation:
0: unknown dimension 1:know dimension
example:
point value
Bitmap
p1
(2,3,-), 110
p2 (1,-,4), 101
p3 (-,-,1). 001
p1.B and p2.B=100<-comparable
p1.B and p3.B=000<-incomparable
6
Methods and Algorithms
 The Replacement Algorithm.
 The Bucket Algorithm.
 The ISkyline Algorithm.
7
The Replacement Algorithm
 Replace unknown dimension by
 .
 Use traditional Skyline algorithm to get Ssky 
p1
p2
P3
(4,-,-,8)
(6,3,-,9)
(-,-,-,10)
Incomplete Data
p2
P3
(6,3,-,9)
(-,-,-,10)
Ssky
8

Replace  
p1
(4, , ,8)
p2
(6,3, ,9)
P3
(  ,  , ,10)
Complete Data

P3
(-,-,-,10)
Ssky
The Bucket Algorithm
 To divide all incoming points into distinct buckets where
all points in each bucket have the same bitmap
representation.
 Skylines of each bucket: local skyline.
 Collect all local skyline in one list, termed candidate
skyline.
 Perform an exhaustive pairwise comparison among all
points to get the query answer.
9
(Cont.)
Global Skyline
Local Skyline
4
1
10
Candidate Skyline
(Cont.)
 In general, performance is better than the replacement
algorithm because candidate list is likely to be smaller than
set Ssky  in the replacement algorithm.
 Candidate skylines may be excessive size
 Missing a chance to use the bucket data to reduce the
comparisons
11
The ISkyline Algorithm
 Virtual Points
 Shadow Points
 The ISkyline Algorithm
12
Virtual Points
P1,P2,Q1,P3,P4依序進入
13
Shadow Points
Q1 dominates P3=> add virtual point Q1v to P’s local_skyline
Q4 is dominated by P3. But we just check “local skyline”.
Q4 don’t be dominated.
14
(Cont.)
 Shadow Points: points that are only dominated by virtual
points.
Q1 is dominated by S4v.
Q3 is dominated by S4v.
15
The ISkyline Algorithm
 Phase I:Insert P,
1.If P is dominated by real point in Local Skyline=>Remoed P.
2.If P is dominated by virtual point in Local skyline
=>Insert to shadow skyline point.
3.If P is local skyline point=>Insert to the Candidate
skyline.( Phase II)
 Phase II:the number of the Candidate skyline>t=>Insert to
the global skyline
16
(Cont.)
Global skyline
Candidate skyline
P1(6,4,-)
Node P = 110 Node Q= 101 Node R= 011
17
P1(6,4,-)
t=2
(Cont.)
Global skyline
Candidate skyline
P1(6,4,-)
P1
P1(6,4,-)
Node P = 110 Node Q= 101 Node R= 011
18
t=2
(Cont.)
Global skyline
Candidate skyline
P1(6,4,-)
P1
P1(6,4,-)
Node P = 110 Node Q= 101 Node R= 011
19
t=2
(Cont.)
Global skyline
Candidate skyline
P1(6,4,-)
P1 Q1
Q1(9,-,1)
Node P = 110 Node Q= 101 Node R= 011
20
P1(6,4,-)
Q1(8,-,1)
t=2
(Cont.)
Global skyline
Candidate skyline
P1(6,4,-)
P1 Q1
Q1(9,-,1)
Node P = 110 Node Q= 101 Node R= 011
21
P1(6,4,-)
Q1(9,-,1)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P1(6,4,-)
P1 Q1
Q1(9,-,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
22
P1(6,4,-)
Q1(9,-,1)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P1(6,4,-)
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
Q1 R1
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
23
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 R1
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
24
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 R1 P2
Q1(9,-,1)
R1(-,3,1)
t=2
|Candidate skyline|>2
Insert to Global skyline
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
25
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Compare against Shadow skyline
(Cont.)
Global skyline
Candidate skyline
Q1v(8,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 R1 P2
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
26
P1(6,4,-)
Q1(8,-,1)
R1(-,3,1)
P2(9,3,-)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(8,-,-)
P2(9,3,-)
P1(6,4,-)
R1 is dominated by P1
Q1 R1 P2
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
27
P1(6,4,-)
Q1(8,-,1)
R1(-,3,1)
P2(9,3,-)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 P2
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
28
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 P2
Q1(9,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
29
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 P2
Q1(9,-,1)
Q2(6,-,1)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
30
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 P2
Q1(9,-,1)
Q2(6,-,1)
R1(-,3,1)
R2(-,6,5)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
31
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 P2
Q1(9,-,1)
Q2(6,-,1)
R2(-,6,5)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
32
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
R2 dominates R1
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Check Candidate skyline and Global skyline
Q1 P2
R2
Q1(9,-,1)
Q2(6,-,1)
R2(-,6,5)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
33
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Q1 and P2 are dominated by R2
Q1 P2
R2
Q1(9,-,1)
Q2(6,-,1)
R2(-,6,5)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
34
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Global skyline: Global skyline
R2
Q1(9,-,1)
Q2(6,-,1)
R2(-,6,5)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
35
Candidate skyline
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
t=2
(Cont.)
Global skyline
Candidate skyline
Q1v(9,-,-)
P2(9,3,-)
P1(6,4,-)
Result is Global skyline:Q2
R2
Q1(9,-,1)
Q2(6,-,1)
R2(-,6,5)
R1(-,3,1)
Node P = 110 Node Q= 101 Node R= 011
P1(6,4,-)
Shadow skyline
36
P1(6,4,-)
Q1(9,-,1)
R1(-,3,1)
P2(9,3,-)
Q2(6,-,1)
R2(-,6,5)
t=2
Experiment Results
37
(Cont.)
38
(Cont.)
39
Conclusion
 Base on traditional skyline Query: the Replacement
Algorithm and the Bucket Algorithm.
 New method: the ISkyline Algorithm.
 The performance of the ISkyline Algorithm is the best of
three.
40