3,3

Randomized Multi-pass
Streaming Skyline Algorithms
VLDB`09
Outline
Introduction
 Algorithms

◦ Streaming algorithm
◦ Fixed-Window algorithm
Experimental Results
 Conclusions

Introduction

Skyline
◦ A set of tuples that are not dominated by any
other point in the d-dimensional dataset
Hotel
Guest Rating
Price
A
5.0
$63.5
B
4.1
$105.8
C
3.0
$62.17
D
4.7
$228.37
E
4.6
$126.26
Introduction
Introduction

Compute the skyline of a massive
database with strong worst-case
performance guarantees
Algorithms

Main Idea
◦ Suppose m is know
◦ Theorem : In 3 passes and m space, we can
find skyline points that ”dominate” at least
n/2 points, with high probability
Algorithms
Stream: p1, p2, … ,pn`
 1.Sample x=24m points p`1, p`2, … ,p`x
 2.Go through the stream, Replace each p`i
by a point dominating it
 3.For each p`i, delete p`i and all points it
dominates and Output p1, p2, …, px

(1,5)
Sample
(3,4)
(4,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms
Stream: p1, p2, … ,pn`
 1.Sample x=24m points p`1, p`2, … ,p`x
 2.Go through the stream, Replace each p`i
by a point dominating it
 3.For each p`i, delete p`i and all points it
dominates and Output p1, p2, …, px

(1,5)
Sample
(3,4)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms
Stream: p1, p2, … ,pn`
 1.Sample x=24m points p`1, p`2, … ,p`x
 2.Go through the stream, Replace each p`i
by a point dominating it
 3.For each p`i, delete p`i and all points it
dominates and Output p1, p2, …, px

(1,5)
Sample
(3,4)
(3,3)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms
Stream: p1, p2, … ,pn`
 1.Sample x=24m points p`1, p`2, … ,p`x
 2.Go through the stream, Replace each p`i
by a point dominating it
 3.For each p`i, delete p`i and all points it
dominates and Output p1, p2, …, px

(1,5)
Sample
(3,4)
(3,3)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms

Draw trees : each point points to its first
dominating point
(3,3)
(1,5)
(3,4)
(4,3)
(4,5)
(4,4)
(1,5)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms

Draw trees : each point points to its first
dominating point
(3,3)
(1,5)
(3,4)
(4,3)
(4,5)
(4,4)
(1,5)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Note: There will be m trees, each rooted by a skyline point
Algorithms

Draw trees : each point points to its first
dominating point
(3,3)
(1,5)
(3,4)
(4,3)
(4,5)
(4,4)
(4,4)
(1,5)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms

Draw trees : each point points to its first
dominating point
(3,3)
(1,5)
(3,4)
(4,3)
(4,5)
(3,3)
(4,4)
(1,5)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms

Claim: The tree that some element is
sampled will be deleted
(3,3)
(1,5)
(3,4)
(4,3)
(4,5)
(3,3)
(4,4)
(1,5)
(3,4)
(4,5)
(4,3)
(3,3)
(4,4)
Algorithms

There are m trees, each rooted by a
skyline point
1
2
m
Algorithms

Big tree has bigger chance of being
sampled and deleted
1
2
m
Algorithms

Analysis
◦ Theorem : Eliminate-Points algorithm deletes
at least n/2 points with probability 1/2
Algorithms

Streaming algorithm
◦ 1.Let n be the number of points in the input
stream ; Let m`=1
◦ 2.Let n` be the current points in the stream
call the ELIMINATE-POINTS(m`)
◦ 3.If more than n`/2 points are left in the
stream, m`=2m` then go to step 2
◦ 4.Until input stream is empty
Algorithms

Fixed-windows algorithm
◦ 1.store only O(w) points in the stream
◦ 2.Call ELIMINATE-POINTS(w/24)
◦ 3.go to step 2 until input stream is empty
Experimental Results
Experimental Results

Running time
Conclusions

Extend the methodology to preprocessing algorithm

Contribution