To Tune or not to Tune?
A Lightweight Physical Design Alerter
Nico Bruno, Surajit Chaudhuri
DMX Group, Microsoft Research
VLDB’06
A DBA’s Dilemma
Physical design tuning is important
Workloads and data change over time
Installations often become suboptimal
Current tools: good but expensive
SELECT …
INSERT …
SELECT …
Tuner
Recommendation:
{Index1, Index2,
View1, View2}
DBMS
DBAs: Avoid suboptimal installations
Periodically run expensive tools
If no improvement, wasted resources
2
A Lightweight Alerter
Low-overhead diagnostics
Reliable lower-bound improvement
No false positives
“Proof” with valid configuration
Upper-bound improvement
Reduce false negatives
3
Outline
Instrumenting the optimizer
Lower bounds
Access path selection
Index requests
Local transformations
Alerting algorithm
Upper bounds
Experimental results
4
Access Path Selection
Logical sub-plan
πc,d (σ a=10 (T))
Physical plans
Project(Filter(…))
Tag logical subplan
with index request
{(a, 0.85)}, Ø, {c,d}
Access Path Generation Module
Available indexes
Instrumentation
Original optimizer
Single entry-point for access-path
selection (System-R, Cascades)
Intercept requests during optimization,
save logical properties for later
5
Access Path Requests
ρ3 =({(T3.z, 1)}, Ø,
{T3.z,T3.b}, 500)
T1.x=T2.y
T1.w=T3.z
ρ4 =({(T3.z, 0.2)}, Ø,
{T3.z,T3.b}, 2500)
ρ2 =({(T2.y, 0.2)}, Ø,
{T2.y}, 2500)
ρ1 =( {(T1.a, 2500)}, Ø,
{T1.a,T1.x,T1.w}, 1 )
T1.x=T2.y
T1.w=T3.z
ρ5 =( {(T3.b, 5000)}, Ø,
{T3.b,T3.z}, 1 )
T1.a=5
T1
T3.b=8
T2
T3
SELECT T.b
FROM T1, T2, T3
WHERE T1.x=T2.y AND T1.w=T3.z AND T1.a=5 AND T3.b=8
6
Monitoring Access Path Requests
Hash
Join
.w=T33.z.z
TT11.w=T
, 0.45
ρ3 =({(Tρ33.z,
1)}, secs
Ø,
(left=0.23
secs)
{T3.z,T3.b}, 500)
ρ4 =({(T3.z, 0.2)}, Ø,
{T3.z,T3.b}, 2500)
ρ2,2.y,
0.23
secsØ,
ρ2 =({(T
0.2)},
(left=0.08
secs)
{T2.y}, 2500)
ρ1 =( {(T1.a, 2500)}, Ø,
secs
1, 0.08
{T1ρ
.a,T
1.x,T1.w}, 1 )
“AND/OR trees”
T1.x=T2.y
Hash
Join
TT1.x=T
.x=T2.y
.y
1
2
T1.w=T3.z
ρ5, 0.05 secs
ρ5 =( {(T3.b, 5000)}, Ø,
{T3.b,T3.z}, 1 )
Filter(T1.a=5)
T1.a=5
Filter(T3.b=8)
T3.b=8
T1
T2
Scan(T1)
T3
Scan(T2)
Scan(T3)
Encode
relationships
between
requests
Aggregated
across queries
AND
AND
ρ1
OR
Ø
OR
ρ1
Ø
OR
ρ2
OR
ρ2
ρ3
OR
ρ
Ø3
ρ5
2-level normalized
AND/OR tree.
ρ5
7
Local Transformations
ρ3, 0.45 secs
(left=0.23 secs)
Hash
Join
T1.w=T3.z
ρ2, 0.23 secs
(left=0.08 secs)
Hash
Join
T1.x=T2.y
ρ1, 0.08 secs
ρ5, 0.05 secs
Seek(I1,a=5)
Filter(T1.a=5)
Scan(T1)
Filter(T3.b=8)
Scan(T2)
Scan(T3)
I1(a,x,w)
If cost is 0.02, query
is 0.08-0.02 = 0.06
faster
Requests encode
properties of any
physical plan rooted at
the corresponding
operator
Allow cost inferences
for varying physical
designs without
calling the optimizer
Result is upper bound
of query cost after true
optimization
8
Impact of Hypothetical Indexes
Single index, single request
Exploits logical information about request
Safe inferences on subset of valid plans
Only need costs, do not “build” plans
Multiple indexes, multiple requests
Analyze all available indexes for each request
Exploit AND/OR tree for multiple requests
Measures lower bound in difference between current
and original configurations
9
Alerting Algorithm
For each request in T, obtain index
that results in best strategy
Repeat while space constraint is
notbetween
satisfiedstorage
and improvement
If size
bounds
still large enough.
and improvement
is big enough,
Transformations:
save
configuration
for alert.
- Index
Merge.
- Index Deletion.
AND/OR tree gathered during original
optimization
No additional optimizer calls!
10
Upper Bounds
Reduce false negatives
Fast Upper Bounds
Alert if: improvement is at least 25%
OR maximum improvement is 75%
Track all requests (not only AND/OR tree)
Group requests by table
Calculate “required work”
Tighter Upper Bounds
Add new optimization phase that only
considers viable plans
More expensive, but tightest upper bound
11
Handling Updates
Update queries are handled as:
(select core) + (update shell)
Optimizer instrumentation:
also gathers update information
Lower bounds: small changes to main
algorithm (skyline of alternatives, nonmonotonic improvement)
Upper bounds: Add necessary work for
update shells
12
Experimental Evaluation
Real and synthetic databases
Metrics: Execution time and Improvement
Experiments:
Monitoring Overhead (server optimization)
Diagnostics Overhead (alerting client)
Quality of bounds/recommendation
13
Performance
TPC-H Database and workloads
Client Overhead for lower + upper bounds
Server Overhead for Upper Bounds (Lower Bound Overhead << 1%)
14
Varying Workloads
Expected Improvement
80%
W1 follow s W0
W2 different from W0
W3 is W1 union W2
60%
40%
20%
0%
1000
1500
2000
2500
3000
3500
4000
4500
5000
Configuration Size (MB)
TPC-H workloads
W1 (first 11 queries)
W2 (last 11 queries)
W3 (mix).
Initial design tuned for W1
15
Varying Initial Physical Design
TPC-H database and workloads
Ci is recommendation of alerter after
executing the workload under Ci-1
16
Conclusions
Alerter fills gap in automatic physical
design tools
Low server/client overhead, can
monitor/diagnose very efficiently
Lower bounds are supported by valid
(applicable) configurations
Upper bounds provide additional
flexibility for defining policies
17
Single-Query Workloads
TPC-H Database and workloads
100%
75%
50%
25%
Q21
Q19
Q17
Fast Upper Bound
Q15
Q13
Q11
Q9
Tight Upper Bound
Q7
Q5
Q3
Low er Bound
0%
Q1
Percentage Improvement
Lower and Upper bounds for improvement
18
Expected Improvement
Complex Workloads
100%
75%
50%
Low er Bound
Tight Upper Bound
25%
TPCH
Tuning Tool
0%
1000
3000
5000
7000
Expected Improvement
Configuration Size (MB)
100%
75%
50%
Tighter Upper Bound
25%
0%
720
MIRMS
Low er Bound
Tuning Tool
770
820
870
920
Configuration Size (MB)
19
© Copyright 2026 Paperzz