Operator Placement for In-Network Stream Query Processing

Operator Placement for
In-Network Stream
Query Processing
Outline
Introduction
Preliminaries
Filter placement
Extensions
Conclusions
Introduction
In-network query processing
Consider a video surveillance application
 Environment
 Target
 Suspicious activity
dark, movement
 Need
 filter for calculating intensity (F1)
 filter for detecting sufficient motion (F2)
Introduction
Previous work
push down all filters
 since CPU cost << communication cost
What if the queries involve expensive
predicates ?
Objective
place each filter at the “best" node
 based on selectivity and cost
minimize the overall cost
Introduction
 Operator placement problem
 Tradeoff
 Lower computational costs
 Put on the nodes higher up
 Lower transmission cost
 Put on the nodes lower down
 Candidate
 m-level hierarchy
 n filters
 mn possible solutions
 In this paper…
 Key idea
 Model network links as filters
 Content





define the problem
provide a greedy alg. that failed
present a polynomial-time optimal alg.
extend to multiway stream join
…
Preliminaries
 Consider a linear chain of nodes
 Notation
 S = data acquired by node N1
 F = { F1, F2, …, Fn }
 Query
Cost Model
 Three quantities
s(F)r
 Selectivity of filter F : s(F)
 fraction of the tuples in stream S that are expected to satisfy F
 Cost of filter F : c(F, i)
 per-tuple cost of execution on node Ni
 c(F, i+1) = i c(F, i)
 i ≤ 1 (if i > 1
)
 Cost of network transmission : li
 per-tuple cost of transmitting from Ni to Ni+1
r
Cost Model
 Notation





P(F) = i if filter F is executed on Ni
Fi = { F | P(F) = i }
F’ = F’1, F’2, …, F’n’ c(F’, i) = the cost per tuple of executing F’ at node Ni
r(Fi) = Fi in rank order
 Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates. 1993
 Cost on a single node
 Overall cost

Example 2.2
s(F) = 1/2
c(P) = c(F1, 1) +
s(F1) c(F2, 1) +
s(F1) s(F2) [ l1 + l2 + c(F3, 3) ] +
s(F1) s(F2) s(F3) [ l3 + c(F4, 4) ]
= 200 +
(½) 400 +
(½) (½) [ 700 + 500 + (1/5) (1/2) 1300 ] +
(½) (½) (½) [ 300 + (1/5) (1/2) (1/4) 2500 ]
= 200 + 200 + 332.5 + 45.3125 = 777.8125
Filter Placement
1. Greedy algorithm
2. Optimal algorithm
Greedy algorithm
Notation
c(P, i) = part of the total cost c(P) incurred at Ni
 including transmission from Ni to Ni+1
network link Ni to Ni+1 :
 s( ) = 0, c(


,1) = li
Example 3.3
At N1, r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, Fl1
= 700 > r(F1)
At N2, r(F2) = 160, r(F3) = 520, r(F4) = 1000, Fl2 = 500 > r(F2)
At N3, r(F3) = 260, r(F4) = 500, Fl3 = 300 > r(F3)
At N4, r(F4)
c(P) = 200 + 350 + 40 + 125 + 32.5 + 37.5 + 7.8125 = 792.8125
Optimal algorithm
 Notation
 network link Ni to Ni+1 :



,
Optimal algorithm
 Short-circuiting
 Rank
 Cost scaleup
Optimal algorithm
Example 3.7
Model links as filters
= 4571.42857142857 ,
r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, r(Fl1 ) = 875, r(Fl2,4 ) = 4571.4
r(F1) < r(F2) < r(Fl1 ) < r(F3) < r(Fl2,4 ) < r(F4)
c(P) = 200 + 200 + 175 + 65 + 100 + 7.8125 = 747.8125
Extensions
Correlated filters
Tree hierarchies
Joins
Other extensions
Correlated filters
 Definition
 Conditional selecivity
 s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q
 Reference
 Optimal ordering of correlated filters at a single node
 NP-hard
 guaranteed to find a cost at most 4 times the opt. cost
 Approximation ratio of 4
 the best possible unless P = NP

Correlated filters
 Definition


,

 Short-circuiting
 Optimal solution
 Tree hierarchy

=
 Each of the queries operates on different data.
 There is no sharing computation or transmission among them.
Joins
 Problem
 k different data streams acquired by N1
 Solution
 Reference
 Sliding-window join
 MJoin operator
 at a single node
 join tree is left as future work
 Query
 W1 and W2 represent the lengths of the windows (time-pased or tuplebased) on streams S1 and S2.
Joins
Joint operator
Illustration
r1
r2
s()r1 r2
Selectivity
 s() = the fraction of the cross product that occurs
in the join result
Cost

Joins
Notation
Fi = filters that can be applied either on Si
before the join or after
| Fi | = ni
F12 = filters that can be applied only on after e
the join

Joins
 Time complexity : O(n2n1m(n+m)log(n+m))
Extensions
 Constrained nodes



 Per-filter cost scaling
 c(F, i+1) / c(F, i) may be different for different F.
 Modeling network links as filters no longer applies.
 It becomes NP-hard.
Conclusion
 Environment
 Operator placement problem
 Tradeoff
 Lower computational costs
 Put on the nodes higher up
 Lower transmission cost
 Put on the nodes lower down
 Provide
 Greedy alg. & Optimal alg.
 Extensions
Lemma 3.1
by (2)
Theorem 3.2
F1 in P is chosen according to the theorem.
∵ Lemma 3.1 and s(Fl1)=0 ∴
F’1 in P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Lemma 3.4
i 2
i 2
j 1
j 1
i 2
li  1    j [li  1(  j ) ]    j [c( Fi l 1 ,1)]
1
j 1
Theorem 3.5
F1 in P is chosen according to the theorem.
∵ Lemma 3.4 ∴
P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Lemma 3.6
Suppose
and the best
Moving the filters on node Ni to Ni-1
Moving the filters on node Ni to Ni+1
∵ P is best plan ∴ c( P) < c( P’) , c( P) < c( P”)
→
implies → contradiction