Slides

To Tune or not to Tune?
A Lightweight Physical Design Alerter
Nico Bruno, Surajit Chaudhuri
DMX Group, Microsoft Research
VLDB’06
A DBA’s Dilemma

Physical design tuning is important



Workloads and data change over time
Installations often become suboptimal
Current tools: good but expensive
SELECT …
INSERT …
SELECT …
Tuner
Recommendation:
{Index1, Index2,
View1, View2}
DBMS

DBAs: Avoid suboptimal installations


Periodically run expensive tools
If no improvement, wasted resources
2
A Lightweight Alerter


Low-overhead diagnostics
Reliable lower-bound improvement



No false positives
“Proof” with valid configuration
Upper-bound improvement

Reduce false negatives
3
Outline

Instrumenting the optimizer



Lower bounds




Access path selection
Index requests
Local transformations
Alerting algorithm
Upper bounds
Experimental results
4
Access Path Selection
Logical sub-plan
πc,d (σ a=10 (T))
Physical plans
Project(Filter(…))
Tag logical subplan
with index request
{(a, 0.85)}, Ø, {c,d}
Access Path Generation Module
Available indexes
Instrumentation


Original optimizer
Single entry-point for access-path
selection (System-R, Cascades)
Intercept requests during optimization,
save logical properties for later
5
Access Path Requests
ρ3 =({(T3.z, 1)}, Ø,
{T3.z,T3.b}, 500)
T1.x=T2.y
T1.w=T3.z
ρ4 =({(T3.z, 0.2)}, Ø,
{T3.z,T3.b}, 2500)
ρ2 =({(T2.y, 0.2)}, Ø,
{T2.y}, 2500)
ρ1 =( {(T1.a, 2500)}, Ø,
{T1.a,T1.x,T1.w}, 1 )
T1.x=T2.y
T1.w=T3.z
ρ5 =( {(T3.b, 5000)}, Ø,
{T3.b,T3.z}, 1 )
T1.a=5
T1
T3.b=8
T2
T3
SELECT T.b
FROM T1, T2, T3
WHERE T1.x=T2.y AND T1.w=T3.z AND T1.a=5 AND T3.b=8
6
Monitoring Access Path Requests
Hash
Join
.w=T33.z.z
TT11.w=T
, 0.45
ρ3 =({(Tρ33.z,
1)}, secs
Ø,
(left=0.23
secs)
{T3.z,T3.b}, 500)
ρ4 =({(T3.z, 0.2)}, Ø,
{T3.z,T3.b}, 2500)
ρ2,2.y,
0.23
secsØ,
ρ2 =({(T
0.2)},
(left=0.08
secs)
{T2.y}, 2500)
ρ1 =( {(T1.a, 2500)}, Ø,
secs
1, 0.08
{T1ρ
.a,T
1.x,T1.w}, 1 )
“AND/OR trees”
T1.x=T2.y
Hash
Join
TT1.x=T
.x=T2.y
.y
1
2
T1.w=T3.z
ρ5, 0.05 secs
ρ5 =( {(T3.b, 5000)}, Ø,
{T3.b,T3.z}, 1 )
Filter(T1.a=5)
T1.a=5
Filter(T3.b=8)
T3.b=8
T1
T2
Scan(T1)

T3
Scan(T2)
Scan(T3)

Encode
relationships
between
requests
Aggregated
across queries
AND
AND
ρ1
OR
Ø
OR
ρ1
Ø
OR
ρ2
OR
ρ2
ρ3
OR
ρ
Ø3
ρ5
2-level normalized
AND/OR tree.
ρ5
7
Local Transformations
ρ3, 0.45 secs
(left=0.23 secs)
Hash
Join
T1.w=T3.z

ρ2, 0.23 secs
(left=0.08 secs)
Hash
Join
T1.x=T2.y
ρ1, 0.08 secs
ρ5, 0.05 secs
Seek(I1,a=5)
Filter(T1.a=5)
Scan(T1)
Filter(T3.b=8)
Scan(T2)

Scan(T3)
I1(a,x,w)

If cost is 0.02, query
is 0.08-0.02 = 0.06
faster
Requests encode
properties of any
physical plan rooted at
the corresponding
operator
Allow cost inferences
for varying physical
designs without
calling the optimizer
Result is upper bound
of query cost after true
optimization
8
Impact of Hypothetical Indexes

Single index, single request




Exploits logical information about request
Safe inferences on subset of valid plans
Only need costs, do not “build” plans
Multiple indexes, multiple requests


Analyze all available indexes for each request
Exploit AND/OR tree for multiple requests
Measures lower bound in difference between current
and original configurations
9
Alerting Algorithm
For each request in T, obtain index
that results in best strategy
Repeat while space constraint is
notbetween
satisfiedstorage
and improvement
If size
bounds
still large enough.
and improvement
is big enough,
Transformations:
save
configuration
for alert.
- Index
Merge.
- Index Deletion.


AND/OR tree gathered during original
optimization
No additional optimizer calls!
10
Upper Bounds

Reduce false negatives


Fast Upper Bounds




Alert if: improvement is at least 25%
OR maximum improvement is 75%
Track all requests (not only AND/OR tree)
Group requests by table
Calculate “required work”
Tighter Upper Bounds


Add new optimization phase that only
considers viable plans
More expensive, but tightest upper bound
11
Handling Updates

Update queries are handled as:
(select core) + (update shell)



Optimizer instrumentation:
also gathers update information
Lower bounds: small changes to main
algorithm (skyline of alternatives, nonmonotonic improvement)
Upper bounds: Add necessary work for
update shells
12
Experimental Evaluation

Real and synthetic databases
Metrics: Execution time and Improvement

Experiments:




Monitoring Overhead (server optimization)
Diagnostics Overhead (alerting client)
Quality of bounds/recommendation
13
Performance

TPC-H Database and workloads
Client Overhead for lower + upper bounds
Server Overhead for Upper Bounds (Lower Bound Overhead << 1%)
14
Varying Workloads
Expected Improvement
80%
W1 follow s W0
W2 different from W0
W3 is W1 union W2
60%
40%
20%
0%
1000
1500
2000
2500
3000
3500
4000
4500
5000
Configuration Size (MB)

TPC-H workloads




W1 (first 11 queries)
W2 (last 11 queries)
W3 (mix).
Initial design tuned for W1
15
Varying Initial Physical Design


TPC-H database and workloads
Ci is recommendation of alerter after
executing the workload under Ci-1
16
Conclusions

Alerter fills gap in automatic physical
design tools



Low server/client overhead, can
monitor/diagnose very efficiently
Lower bounds are supported by valid
(applicable) configurations
Upper bounds provide additional
flexibility for defining policies
17
Single-Query Workloads
TPC-H Database and workloads
100%
75%
50%
25%
Q21
Q19
Q17
Fast Upper Bound
Q15
Q13
Q11
Q9
Tight Upper Bound
Q7
Q5
Q3
Low er Bound
0%
Q1
Percentage Improvement

Lower and Upper bounds for improvement
18
Expected Improvement
Complex Workloads
100%
75%
50%
Low er Bound
Tight Upper Bound
25%
TPCH
Tuning Tool
0%
1000
3000
5000
7000
Expected Improvement
Configuration Size (MB)
100%
75%
50%
Tighter Upper Bound
25%
0%
720
MIRMS
Low er Bound
Tuning Tool
770
820
870
920
Configuration Size (MB)
19