Randomized Multi-pass Streaming Skyline Algorithms VLDB`09 Outline Introduction Algorithms ◦ Streaming algorithm ◦ Fixed-Window algorithm Experimental Results Conclusions Introduction Skyline ◦ A set of tuples that are not dominated by any other point in the d-dimensional dataset Hotel Guest Rating Price A 5.0 $63.5 B 4.1 $105.8 C 3.0 $62.17 D 4.7 $228.37 E 4.6 $126.26 Introduction Introduction Compute the skyline of a massive database with strong worst-case performance guarantees Algorithms Main Idea ◦ Suppose m is know ◦ Theorem : In 3 passes and m space, we can find skyline points that ”dominate” at least n/2 points, with high probability Algorithms Stream: p1, p2, … ,pn` 1.Sample x=24m points p`1, p`2, … ,p`x 2.Go through the stream, Replace each p`i by a point dominating it 3.For each p`i, delete p`i and all points it dominates and Output p1, p2, …, px (1,5) Sample (3,4) (4,4) (4,5) (4,3) (3,3) (4,4) Algorithms Stream: p1, p2, … ,pn` 1.Sample x=24m points p`1, p`2, … ,p`x 2.Go through the stream, Replace each p`i by a point dominating it 3.For each p`i, delete p`i and all points it dominates and Output p1, p2, …, px (1,5) Sample (3,4) (3,4) (4,5) (4,3) (3,3) (4,4) Algorithms Stream: p1, p2, … ,pn` 1.Sample x=24m points p`1, p`2, … ,p`x 2.Go through the stream, Replace each p`i by a point dominating it 3.For each p`i, delete p`i and all points it dominates and Output p1, p2, …, px (1,5) Sample (3,4) (3,3) (4,5) (4,3) (3,3) (4,4) Algorithms Stream: p1, p2, … ,pn` 1.Sample x=24m points p`1, p`2, … ,p`x 2.Go through the stream, Replace each p`i by a point dominating it 3.For each p`i, delete p`i and all points it dominates and Output p1, p2, …, px (1,5) Sample (3,4) (3,3) (4,5) (4,3) (3,3) (4,4) Algorithms Draw trees : each point points to its first dominating point (3,3) (1,5) (3,4) (4,3) (4,5) (4,4) (1,5) (3,4) (4,5) (4,3) (3,3) (4,4) Algorithms Draw trees : each point points to its first dominating point (3,3) (1,5) (3,4) (4,3) (4,5) (4,4) (1,5) (3,4) (4,5) (4,3) (3,3) (4,4) Note: There will be m trees, each rooted by a skyline point Algorithms Draw trees : each point points to its first dominating point (3,3) (1,5) (3,4) (4,3) (4,5) (4,4) (4,4) (1,5) (3,4) (4,5) (4,3) (3,3) (4,4) Algorithms Draw trees : each point points to its first dominating point (3,3) (1,5) (3,4) (4,3) (4,5) (3,3) (4,4) (1,5) (3,4) (4,5) (4,3) (3,3) (4,4) Algorithms Claim: The tree that some element is sampled will be deleted (3,3) (1,5) (3,4) (4,3) (4,5) (3,3) (4,4) (1,5) (3,4) (4,5) (4,3) (3,3) (4,4) Algorithms There are m trees, each rooted by a skyline point 1 2 m Algorithms Big tree has bigger chance of being sampled and deleted 1 2 m Algorithms Analysis ◦ Theorem : Eliminate-Points algorithm deletes at least n/2 points with probability 1/2 Algorithms Streaming algorithm ◦ 1.Let n be the number of points in the input stream ; Let m`=1 ◦ 2.Let n` be the current points in the stream call the ELIMINATE-POINTS(m`) ◦ 3.If more than n`/2 points are left in the stream, m`=2m` then go to step 2 ◦ 4.Until input stream is empty Algorithms Fixed-windows algorithm ◦ 1.store only O(w) points in the stream ◦ 2.Call ELIMINATE-POINTS(w/24) ◦ 3.go to step 2 until input stream is empty Experimental Results Experimental Results Running time Conclusions Extend the methodology to preprocessing algorithm Contribution
© Copyright 2024 Paperzz