Multi-Query Optimization
Prasan Roy
Indian Institute of Technology - Bombay
Overview
Multi-Query Optimization: What?
– Problem statement
Multi-Query Optimization: Why?
– Application scenarios
Multi-Query Optimization: How?
– A cost-based practical approach
– Prototyping Multi-Query Optimization
• On MS SQL-Server at Microsoft
• Research prototype at IIT-Bombay
Multi-Query Optimization:
What?
Exploit common subexpressions (CSEs) in
query optimization
Consider DAG execution plans in addition
to tree execution plans
Example
B
A
B
Best Plan for
A JOIN B JOIN C
C
C
D
Best Plan for
B JOIN C JOIN D
Example (contd)
Alternative:
D
A
B
C
Common Subexpression
Multi-Query Optimization:
Why?
Queries on views, nested queries, …
Overlapping query batches generated by
applications
Update expressions for materialized views
Query invocations with different
parameters
...
Practical solutions needed!
Multi-Query Optimization:
How?
Set up the search space
– Identify the common subexpressions
Explore the search space efficiently
– Find the best way to exploit the
common subexpressions
Problems
Materializing and sharing a CSE not
necessarily cheaper
Mutually exclusive alternatives
(A JOIN B JOIN C)
(B JOIN C JOIN D)
(C JOIN D JOIN E)
What to share: (B JOIN C) or (C JOIN D) ?
Huge search space!
Earlier Work:
Practical Solutions
As early as 1976
Preprocess query before optimization
[Hall, IBM-JRD76]
As late as 1998
Postprocess optimized plans
[Subramanium and Venkataraman, SIGMOD98]
Query optimizer is not aware!
Earlier Work:
Theoretical Studies
[Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],...
Set of queries {Q1, Q2, …, Qn}
For each query Qi, set of execution plans
{Pi1, Pi2, …, Pim}
Pij is a set of tasks from a common pool
Pick a plan for each query such that the
cost of tasks in the union is minimized
Not integrated with existing optimizers, no practical study
Microsoft Experience
with Paul Larson,
Microsoft Research
Prototyping MQO on
SQL-Server
Add multi-query optimization capability to
SQL-Server
Well integrated with the existing
optimization framework
– another optimization level
– minimal changes, minimal extra lines of code
First cut: exhaustive
– How slow can it be?
A working prototype by the summer-end
What (almost) already exists
in the SQL-Server Optimizer
AND/OR Query-DAG representation of plan space
Group (OR node)
Op (AND node)
A
B
C
D
What actually exists in the
SQL-Server Optimizer
Relations cloned for each use
A
B1
C1
B2
C2
D
Preprocessing Step:
Query-DAG Unification
Performed in a bottom-up traversal
A
B1
C1
B2
C2
D
Common Subexpression
Identification
Unified nodes are CSEs
Common Subexpression
A
B
C
D
Exploring the Search Space: A
Naïve Algorithm
For each set S of common
subexpressions
– materialize each node in S
– MatCost(S) = sum of materialization costs of
the nodes in S
– invoke optimizer to find the best plan for the
root and for each node S
– CompCost(S) = sum of costs of above plans
– Cost(S) = MatCost(S) + CompCost(S)
Pick S with the minimum Cost
Doing Better:
Incremental Reoptimization
Goal: best plan for Si best plan for Sj
Observation
– Best plans change for only the ancestors of
nodes in Si XOR Sj
Algorithm:
– Propagate changed costs in bottom-up
topological order from nodes in Si XOR Sj
– Update min-cost plan at each node visited
– Do not propagate further up if min-cost plan
remains unchanged at a node
Work done at IIT-Bombay
Incremental Optimization:
Example
Si =
min-cost
A
B
C
D
Incremental Optimization:
Example
Si =
Now materialized
Sj = {(B JOIN C)}
Previous min-cost
New min-cost
A
B
C
D
Current Status
A first-cut implementation working
– Lines of C++ code added: 1500 approx.
Future Work
Performance tuning and smarter data
structures needed
Ways to restrict enumeration taking
DAG structure into account
Research at IIT-Bombay:
Heuristics for MQO
with S. Sudarshan, S. Seshadri
A Greedy Heuristic
Pick nodes for materialization one at a
time, in “benefit” order
Benefit(n) = reduction in cost on materialization of n
Benefit computation is expensive
Monotonicity Assumption
Benefit of a node does not increase due to
materialization of other nodes
Exploited to avoid some benefit computations
Optimization costs decrease by 90%
A Postpass Heuristic:
Volcano-SH
No change in Volcano best plan
computation
Cost-based materialization of nodes in
best Volcano plan
Implementation easy
Low overhead
Optimizer is not aware
A Volcano Variant:
RU
Volcano-
Volcano best plan search aware of best
plans for earlier queries
– Cost based materialization of best plan nodes
that are used by later queries
Implementation easy
Low overhead
Local decisions, plan quality sensitive to query sequence
Experimental Conclusion
Greedy
– Expensive, but practical
– Overheads typically offset by plan quality
• especially for expensive “canned” queries
– Almost linear scaleup with query batch size
• typically, only the width of the Query DAG affected
Volcano-RU
– Mostly better than Volcano-SH, same overhead
– Negligible overhead over Volcano
• recommended for cheap but complex queries
Conclusion
Multi-query optimization is needed
Multi-query optimization is practical!
Multi-query optimization is an easy
next step for DAG-based optimizers
© Copyright 2026 Paperzz