Multi-Query Optimization

Multi-Query Optimization
Prasan Roy
Indian Institute of Technology - Bombay
Overview

Multi-Query Optimization: What?
– Problem statement

Multi-Query Optimization: Why?
– Application scenarios

Multi-Query Optimization: How?
– A cost-based practical approach
– Prototyping Multi-Query Optimization
• On MS SQL-Server at Microsoft
• Research prototype at IIT-Bombay
Multi-Query Optimization:
What?
Exploit common subexpressions (CSEs) in
query optimization
 Consider DAG execution plans in addition
to tree execution plans
Example
B
A
B
Best Plan for
A JOIN B JOIN C
C
C
D
Best Plan for
B JOIN C JOIN D
Example (contd)
Alternative:
D
A
B
C
Common Subexpression
Multi-Query Optimization:
Why?





Queries on views, nested queries, …
Overlapping query batches generated by
applications
Update expressions for materialized views
Query invocations with different
parameters
...
Practical solutions needed!
Multi-Query Optimization:
How?

Set up the search space
– Identify the common subexpressions

Explore the search space efficiently
– Find the best way to exploit the
common subexpressions
Problems

Materializing and sharing a CSE not
necessarily cheaper
 Mutually exclusive alternatives
(A JOIN B JOIN C)
(B JOIN C JOIN D)
(C JOIN D JOIN E)
What to share: (B JOIN C) or (C JOIN D) ?
Huge search space!
Earlier Work:
Practical Solutions
As early as 1976
 Preprocess query before optimization
[Hall, IBM-JRD76]
As late as 1998
 Postprocess optimized plans
[Subramanium and Venkataraman, SIGMOD98]
Query optimizer is not aware!
Earlier Work:
Theoretical Studies
[Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],...
Set of queries {Q1, Q2, …, Qn}
 For each query Qi, set of execution plans
{Pi1, Pi2, …, Pim}
 Pij is a set of tasks from a common pool
Pick a plan for each query such that the
cost of tasks in the union is minimized

Not integrated with existing optimizers, no practical study
Microsoft Experience
with Paul Larson,
Microsoft Research
Prototyping MQO on
SQL-Server
Add multi-query optimization capability to
SQL-Server
 Well integrated with the existing
optimization framework
– another optimization level
– minimal changes, minimal extra lines of code

First cut: exhaustive
– How slow can it be?

A working prototype by the summer-end
What (almost) already exists
in the SQL-Server Optimizer
 AND/OR Query-DAG representation of plan space
Group (OR node)
Op (AND node)
A
B
C
D
What actually exists in the
SQL-Server Optimizer

Relations cloned for each use
A
B1
C1
B2
C2
D
Preprocessing Step:
Query-DAG Unification
 Performed in a bottom-up traversal


A




B1
C1
B2
C2
D
Common Subexpression
Identification

Unified nodes are CSEs
Common Subexpression
A
B
C
D
Exploring the Search Space: A
Naïve Algorithm

For each set S of common
subexpressions
– materialize each node in S
– MatCost(S) = sum of materialization costs of
the nodes in S
– invoke optimizer to find the best plan for the
root and for each node S
– CompCost(S) = sum of costs of above plans
– Cost(S) = MatCost(S) + CompCost(S)

Pick S with the minimum Cost
Doing Better:
Incremental Reoptimization
Goal: best plan for Si  best plan for Sj
 Observation
– Best plans change for only the ancestors of
nodes in Si XOR Sj

Algorithm:
– Propagate changed costs in bottom-up
topological order from nodes in Si XOR Sj
– Update min-cost plan at each node visited
– Do not propagate further up if min-cost plan
remains unchanged at a node
Work done at IIT-Bombay
Incremental Optimization:
Example

Si = 
min-cost
A
B
C
D
Incremental Optimization:
Example

Si = 
Now materialized
Sj = {(B JOIN C)}



Previous min-cost
New min-cost
A
B
C
D
Current Status

A first-cut implementation working
– Lines of C++ code added: 1500 approx.
Future Work
Performance tuning and smarter data
structures needed
 Ways to restrict enumeration taking
DAG structure into account

Research at IIT-Bombay:
Heuristics for MQO
with S. Sudarshan, S. Seshadri
A Greedy Heuristic

Pick nodes for materialization one at a
time, in “benefit” order
Benefit(n) = reduction in cost on materialization of n
Benefit computation is expensive
Monotonicity Assumption

Benefit of a node does not increase due to
materialization of other nodes
Exploited to avoid some benefit computations
Optimization costs decrease by 90%
A Postpass Heuristic:
Volcano-SH

No change in Volcano best plan
computation
 Cost-based materialization of nodes in
best Volcano plan
Implementation easy
Low overhead
Optimizer is not aware
A Volcano Variant:
RU

Volcano-
Volcano best plan search aware of best
plans for earlier queries
– Cost based materialization of best plan nodes
that are used by later queries
Implementation easy
Low overhead
Local decisions, plan quality sensitive to query sequence
Experimental Conclusion

Greedy
– Expensive, but practical
– Overheads typically offset by plan quality
• especially for expensive “canned” queries
– Almost linear scaleup with query batch size
• typically, only the width of the Query DAG affected

Volcano-RU
– Mostly better than Volcano-SH, same overhead
– Negligible overhead over Volcano
• recommended for cheap but complex queries
Conclusion
Multi-query optimization is needed
 Multi-query optimization is practical!
 Multi-query optimization is an easy
next step for DAG-based optimizers
