Eddies: Continuously Adaptive Query Processing

Eddies: Continuously
Adaptive Query Processing
Ross Rosemark
Goals of Presentation
 What are traditional query optimizers?
 What are the problems with these
optimizers?
 What is an eddy?
 How does an eddy work?
 Discuss how an eddy self tunes query
plans?
Traditional Query Optimizer
 Metadata (statistics) about the distribution
of data is collected by the query optimizer
 Based on the metadata the query
optimizer determines the most energy
efficient query plan
 Query plan is executed.
 This is good for snapshot queries (short
lived queries)
Problems
 The problem with this approach is that data became streaming
 Internet
 Sensor nodes
 Etc.
 The query plan chosen by the query optimizer could eventually become
inefficient if the metadata statistics change
 Cost of operators
 Operator selectivies could change
 Rate tuples arrive from inputs
 Also it’s difficult to just re-optimize a query
 Determining when to re-optimize the query is a difficult research issue that has
not been addressed.
 Must determine not only when you should re-optimize but that re-optimizing still leaves
produces the same state
Example
 Assume a query that asks for Employees
age and salary > 100000 is issued into the
sensor network.
 Initially the predicate salary > 100000 will be
very selective… hence eliminating a lot of
tuples
 As older employees in the database start to be
read the predicate salary > 100000 will be less
selective
Lets show some things Eddie must
consider.
 Eddie should:
 Increase synchronization
 Increase the moments of Symmetry
Synchronization Barriers
 Synchronization barriers
 Where one operation hinders the speed of another operations
 Extreme example
 Assume you have a merge join on two duplicate-free inputs. (slowlo and




fasthi)
Assume that during processing the next tuple is always consumed from the
relation that had the lowest values
Assume that slowlo is a slowly delivered external relation with many low
columns in it’s bandwith
Assume that fasthi is high bandwith (i.e. delivers tuples fast) and has only
high values in it’s column
In this example fasthi is delayed why slowlo delivers tuples
 Known as a synchronization barrier
 Desirable to minimize the number of Synchronization barriers
Moments of Symmetry
 You can only re-optimize at a moment
of symmetry
 A moment of symmetry is when the
query is executed to a point that the
optimizer can change the query plan
without affecting the way the query
plans predicates are performed
 Typically happens in joins
 Example
 Assume you have a nested loop join
with inner relation R and outer relation
S.
 In this example you can only re-optimize
this join when R is completely scanned.
 It is possible to re-optimize in the middle
of scanning S but the join algorithm
would then have to be changed
Eddy


An Eddy was designed to dynamically reoptimize queries.
The authors implemented Eddies in a River

A river is a shared nothing parallel query
processing framework that dynamically
adapts to fluctuations and workloads

An eddy is implemented via a module in a
river containing an arbitrary number of input
relations, a number of participating unary
and binary modules, and a single output
relation

An Eddy also maintains a fixed size buffer of
tuples that need to be processed

Each operator takes two tuples, processes
them and delivers them back to the eddy
Eddy (Cont)
 A tuple in a eddy is associated with a tuple descriptor
 Contains a vector of Ready bits and Done bits
 The Eddy ships the tuple to only the operators that have
the Ready bits turned on
 After an operators is processed it’s Done bits are set
 If all done bits are set the tuple is sent to the Eddy’s output
 Else it’s sent to another operators
Eddy (Cont)
 So how do you route tuples to the different
Eddy operators
 In this paper they look at multiple different
ways
Naïve Scheme
 Eddies buffer is implemented as a priority queue
 Recall the buffer is used to store tuples that should be processed by the
Eddies
 When a tuple enters a buffer it’s priority is set to low
 After it’s processed by an operator in the Eddy and returned to the
buffer it’s priority is set to high
 This ensures that tuples do not get stuck in the Eddy.
I.e. starvation
 This scheme dynamically adjusts to work required of operators
 Operators that are slower (i.e. take 4 seconds vs. 1 second will receive
less tuples)
 Note each operator has a fixed size queue
 Once queue is filled up no more tuples can be inserted into queue
Lottery Scheme
 Each time a tuple is routed to a operator the operator is credited with
a ticket
 When the operator returns a tuple the eddy one ticket is debited
from the eddies running count for that operator
 Tracks how efficiently a operator drains tuples from the system
 When a tuple is to be routed to an operator the Eddy holds a lottery
 Only the operators that have their Ready bit sets can participate in the
lottery
 An operator’s chance of “winning the lottery” and receiving
the tuple corresponds to the count of tickets for that operator.
 Dynamically adjusts to selectivity of operators
Window Scheme
 Problem with lottery scheme is that it uses to much past info
 Problem: An operator that gained a lot of ticket initially but then became
slow
 In this scheme the lottery scheme is modified such that the lottery
only looks at tickets gained by an operator in a fixed window.
 Keeps track of two types of tickets
 Banked tickets
 Used when running the lottery
 Escrow tickets
 Used to measure efficiency during the window
 At the end of a window
 Banked Tickets = Escrow Tickets
 Escrow Tickets = 0
 Ensure operators re-approve themselves each window
Comparison
 Shows that due to
“Fluid Dynamics” (i.e.
the varying rates of
operators) the Naïve
approach naturally
adjusts based on the
cost of operators
 Shows that Lottery
also adjusts based on
workload
Comparison


Naïve eddies does not adjust based on selectivity

Naïve performs between the best and worse
Lottery does adjust based on selectivity
Joins
 Shows that Lottery
performs nearly
optimally
 Naïve performs
between the best and
worst
Joins (Cont)
 All joins are ripple
joins
 Change selectivity of
join predicate
 In all cases the Eddy
with the Lottery is
close to optimal
Window Scheme
 Both cases are
related to different
query plans
 Shows eddy is in
between the best and
worst case
Adaptability
 Toggle the cost of the
operator 3 times
throughout
experiment
 Notice that Eddy
switches how many
tuples are first
processed by that
operator
Problems
 Does not work well when
there is initially a long
delay at an operator
 Eddy gives all tuples to
operator because
operator is not returning
tuples that satisfy the join
predicates criteria
 Not until after the delay
 Notice eventually this
problem is figured out by
Eddy
 Just make take some time