Skyline Query Processing for Incomplete Data Mohamed E. Khalefa Mohamed F. Mokbel Jus tin J. Levandoski Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA ICDE 2008 1 Outline Introduction Problem Formulation Methods and Algorithms Experiment Results Conclusion 2 Introduction Existing skyline algorithms assume: 1. Date are complete (all dimensions are available for all data ) 2.Transitive relation. p1 p2 p3 (2,3,6) (1,2,4) (1,1,1) p1 dominates p2, p2 dominates p3 => p3 dominates p1. 3 (Cont.) If data is incomplete: 1.Some dimensions are no value. 2.No transitive relation. p1 p2 p3 (2,3,-) (1,-,8) (-,4,2) p1 dominates p2, p2 dominates p3. But p1 don’t dominates p3. p3 dominates p1. Cycle and no transitive relation!! 4 Problem Formulation Dominance Relation for Incomplete data: 1.There is at least one dimension ui where both P.ui and Q.ui are known, and P.ui > Q.ui . 2.For all other dimensions j, j ≠i, either P.uj is unknown, Q.uj is unknown, or P.uj ≥Q.uj . Example: p1 (2,3,-) p2 p3 (1,-,8) (-,4,-) p1 dominates p2. p2 don’t domninate p3, and p3 don’t domninate p2. 5 (Cont.) Bitmap representation: 0: unknown dimension 1:know dimension example: point value Bitmap p1 (2,3,-), 110 p2 (1,-,4), 101 p3 (-,-,1). 001 p1.B and p2.B=100<-comparable p1.B and p3.B=000<-incomparable 6 Methods and Algorithms The Replacement Algorithm. The Bucket Algorithm. The ISkyline Algorithm. 7 The Replacement Algorithm Replace unknown dimension by . Use traditional Skyline algorithm to get Ssky p1 p2 P3 (4,-,-,8) (6,3,-,9) (-,-,-,10) Incomplete Data p2 P3 (6,3,-,9) (-,-,-,10) Ssky 8 Replace p1 (4, , ,8) p2 (6,3, ,9) P3 ( , , ,10) Complete Data P3 (-,-,-,10) Ssky The Bucket Algorithm To divide all incoming points into distinct buckets where all points in each bucket have the same bitmap representation. Skylines of each bucket: local skyline. Collect all local skyline in one list, termed candidate skyline. Perform an exhaustive pairwise comparison among all points to get the query answer. 9 (Cont.) Global Skyline Local Skyline 4 1 10 Candidate Skyline (Cont.) In general, performance is better than the replacement algorithm because candidate list is likely to be smaller than set Ssky in the replacement algorithm. Candidate skylines may be excessive size Missing a chance to use the bucket data to reduce the comparisons 11 The ISkyline Algorithm Virtual Points Shadow Points The ISkyline Algorithm 12 Virtual Points P1,P2,Q1,P3,P4依序進入 13 Shadow Points Q1 dominates P3=> add virtual point Q1v to P’s local_skyline Q4 is dominated by P3. But we just check “local skyline”. Q4 don’t be dominated. 14 (Cont.) Shadow Points: points that are only dominated by virtual points. Q1 is dominated by S4v. Q3 is dominated by S4v. 15 The ISkyline Algorithm Phase I:Insert P, 1.If P is dominated by real point in Local Skyline=>Remoed P. 2.If P is dominated by virtual point in Local skyline =>Insert to shadow skyline point. 3.If P is local skyline point=>Insert to the Candidate skyline.( Phase II) Phase II:the number of the Candidate skyline>t=>Insert to the global skyline 16 (Cont.) Global skyline Candidate skyline P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 17 P1(6,4,-) t=2 (Cont.) Global skyline Candidate skyline P1(6,4,-) P1 P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 18 t=2 (Cont.) Global skyline Candidate skyline P1(6,4,-) P1 P1(6,4,-) Node P = 110 Node Q= 101 Node R= 011 19 t=2 (Cont.) Global skyline Candidate skyline P1(6,4,-) P1 Q1 Q1(9,-,1) Node P = 110 Node Q= 101 Node R= 011 20 P1(6,4,-) Q1(8,-,1) t=2 (Cont.) Global skyline Candidate skyline P1(6,4,-) P1 Q1 Q1(9,-,1) Node P = 110 Node Q= 101 Node R= 011 21 P1(6,4,-) Q1(9,-,1) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P1(6,4,-) P1 Q1 Q1(9,-,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 22 P1(6,4,-) Q1(9,-,1) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P1(6,4,-) P1(6,4,-) Q1(9,-,1) R1(-,3,1) Q1 R1 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 23 t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 R1 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 24 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 R1 P2 Q1(9,-,1) R1(-,3,1) t=2 |Candidate skyline|>2 Insert to Global skyline Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 25 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Compare against Shadow skyline (Cont.) Global skyline Candidate skyline Q1v(8,-,-) P2(9,3,-) P1(6,4,-) Q1 R1 P2 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 26 P1(6,4,-) Q1(8,-,1) R1(-,3,1) P2(9,3,-) t=2 (Cont.) Global skyline Candidate skyline Q1v(8,-,-) P2(9,3,-) P1(6,4,-) R1 is dominated by P1 Q1 R1 P2 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 27 P1(6,4,-) Q1(8,-,1) R1(-,3,1) P2(9,3,-) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 P2 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 28 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 P2 Q1(9,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 29 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 P2 Q1(9,-,1) Q2(6,-,1) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 30 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 P2 Q1(9,-,1) Q2(6,-,1) R1(-,3,1) R2(-,6,5) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 31 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 P2 Q1(9,-,1) Q2(6,-,1) R2(-,6,5) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 32 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) R2 dominates R1 t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Check Candidate skyline and Global skyline Q1 P2 R2 Q1(9,-,1) Q2(6,-,1) R2(-,6,5) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 33 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Q1 and P2 are dominated by R2 Q1 P2 R2 Q1(9,-,1) Q2(6,-,1) R2(-,6,5) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 34 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Global skyline: Global skyline R2 Q1(9,-,1) Q2(6,-,1) R2(-,6,5) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 35 Candidate skyline P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) t=2 (Cont.) Global skyline Candidate skyline Q1v(9,-,-) P2(9,3,-) P1(6,4,-) Result is Global skyline:Q2 R2 Q1(9,-,1) Q2(6,-,1) R2(-,6,5) R1(-,3,1) Node P = 110 Node Q= 101 Node R= 011 P1(6,4,-) Shadow skyline 36 P1(6,4,-) Q1(9,-,1) R1(-,3,1) P2(9,3,-) Q2(6,-,1) R2(-,6,5) t=2 Experiment Results 37 (Cont.) 38 (Cont.) 39 Conclusion Base on traditional skyline Query: the Replacement Algorithm and the Bucket Algorithm. New method: the ISkyline Algorithm. The performance of the ISkyline Algorithm is the best of three. 40
© Copyright 2026 Paperzz