Operator Placement for
In-Network Stream
Query Processing
Outline
Introduction
Preliminaries
Filter placement
Extensions
Conclusions
Introduction
In-network query processing
Consider a video surveillance application
Environment
Target
Suspicious activity
dark, movement
Need
filter for calculating intensity (F1)
filter for detecting sufficient motion (F2)
Introduction
Previous work
push down all filters
since CPU cost << communication cost
What if the queries involve expensive
predicates ?
Objective
place each filter at the “best" node
based on selectivity and cost
minimize the overall cost
Introduction
Operator placement problem
Tradeoff
Lower computational costs
Put on the nodes higher up
Lower transmission cost
Put on the nodes lower down
Candidate
m-level hierarchy
n filters
mn possible solutions
In this paper…
Key idea
Model network links as filters
Content
define the problem
provide a greedy alg. that failed
present a polynomial-time optimal alg.
extend to multiway stream join
…
Preliminaries
Consider a linear chain of nodes
Notation
S = data acquired by node N1
F = { F1, F2, …, Fn }
Query
Cost Model
Three quantities
s(F)r
Selectivity of filter F : s(F)
fraction of the tuples in stream S that are expected to satisfy F
Cost of filter F : c(F, i)
per-tuple cost of execution on node Ni
c(F, i+1) = i c(F, i)
i ≤ 1 (if i > 1
)
Cost of network transmission : li
per-tuple cost of transmitting from Ni to Ni+1
r
Cost Model
Notation
P(F) = i if filter F is executed on Ni
Fi = { F | P(F) = i }
F’ = F’1, F’2, …, F’n’ c(F’, i) = the cost per tuple of executing F’ at node Ni
r(Fi) = Fi in rank order
Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates. 1993
Cost on a single node
Overall cost
Example 2.2
s(F) = 1/2
c(P) = c(F1, 1) +
s(F1) c(F2, 1) +
s(F1) s(F2) [ l1 + l2 + c(F3, 3) ] +
s(F1) s(F2) s(F3) [ l3 + c(F4, 4) ]
= 200 +
(½) 400 +
(½) (½) [ 700 + 500 + (1/5) (1/2) 1300 ] +
(½) (½) (½) [ 300 + (1/5) (1/2) (1/4) 2500 ]
= 200 + 200 + 332.5 + 45.3125 = 777.8125
Filter Placement
1. Greedy algorithm
2. Optimal algorithm
Greedy algorithm
Notation
c(P, i) = part of the total cost c(P) incurred at Ni
including transmission from Ni to Ni+1
network link Ni to Ni+1 :
s( ) = 0, c(
,1) = li
Example 3.3
At N1, r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, Fl1
= 700 > r(F1)
At N2, r(F2) = 160, r(F3) = 520, r(F4) = 1000, Fl2 = 500 > r(F2)
At N3, r(F3) = 260, r(F4) = 500, Fl3 = 300 > r(F3)
At N4, r(F4)
c(P) = 200 + 350 + 40 + 125 + 32.5 + 37.5 + 7.8125 = 792.8125
Optimal algorithm
Notation
network link Ni to Ni+1 :
,
Optimal algorithm
Short-circuiting
Rank
Cost scaleup
Optimal algorithm
Example 3.7
Model links as filters
= 4571.42857142857 ,
r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, r(Fl1 ) = 875, r(Fl2,4 ) = 4571.4
r(F1) < r(F2) < r(Fl1 ) < r(F3) < r(Fl2,4 ) < r(F4)
c(P) = 200 + 200 + 175 + 65 + 100 + 7.8125 = 747.8125
Extensions
Correlated filters
Tree hierarchies
Joins
Other extensions
Correlated filters
Definition
Conditional selecivity
s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q
Reference
Optimal ordering of correlated filters at a single node
NP-hard
guaranteed to find a cost at most 4 times the opt. cost
Approximation ratio of 4
the best possible unless P = NP
Correlated filters
Definition
,
Short-circuiting
Optimal solution
Tree hierarchy
=
Each of the queries operates on different data.
There is no sharing computation or transmission among them.
Joins
Problem
k different data streams acquired by N1
Solution
Reference
Sliding-window join
MJoin operator
at a single node
join tree is left as future work
Query
W1 and W2 represent the lengths of the windows (time-pased or tuplebased) on streams S1 and S2.
Joins
Joint operator
Illustration
r1
r2
s()r1 r2
Selectivity
s() = the fraction of the cross product that occurs
in the join result
Cost
Joins
Notation
Fi = filters that can be applied either on Si
before the join or after
| Fi | = ni
F12 = filters that can be applied only on after e
the join
Joins
Time complexity : O(n2n1m(n+m)log(n+m))
Extensions
Constrained nodes
Per-filter cost scaling
c(F, i+1) / c(F, i) may be different for different F.
Modeling network links as filters no longer applies.
It becomes NP-hard.
Conclusion
Environment
Operator placement problem
Tradeoff
Lower computational costs
Put on the nodes higher up
Lower transmission cost
Put on the nodes lower down
Provide
Greedy alg. & Optimal alg.
Extensions
Lemma 3.1
by (2)
Theorem 3.2
F1 in P is chosen according to the theorem.
∵ Lemma 3.1 and s(Fl1)=0 ∴
F’1 in P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Lemma 3.4
i 2
i 2
j 1
j 1
i 2
li 1 j [li 1( j ) ] j [c( Fi l 1 ,1)]
1
j 1
Theorem 3.5
F1 in P is chosen according to the theorem.
∵ Lemma 3.4 ∴
P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Lemma 3.6
Suppose
and the best
Moving the filters on node Ni to Ni-1
Moving the filters on node Ni to Ni+1
∵ P is best plan ∴ c( P) < c( P’) , c( P) < c( P”)
→
implies → contradiction
© Copyright 2026 Paperzz