Analyzing Plan Diagrams of
Database Query Optimizers
Naveen Reddy Jayant Haritsa
Database Systems Lab
Indian Institute of Science
Bangalore, INDIA
Query Execution Plans
SQL, the standard database query language,
is declarative in nature
– does not specify how the query should be evaluated
– Example:
select StudentName, CourseName
from STUDENT, COURSE, REGISTER
where STUDENT.RollNo = REGISTER.RollNo and
REGISTER.CourseNo = COURSE.CourseNo
join order and techniques are left unspecified
DBMS query optimizer identifies efficient
execution strategy: “query execution plan”
August 2005
Picasso (VLDB)
2
Example Query and Plan
select c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment
from customer, orders, nation
where c_custkey = o_custkey and c_nationkey = n_nationkey and
o_orderdate < date ('2001-11-09') and n_name = 'IRAQ'
RETURN 201,689
HSJOIN 201,689
TBSCAN 175,025
ORDERS
HSJOIN 26,571
TBSCAN 26,512
CUSTOMER
August 2005
Picasso (VLDB)
TBSCAN 50
NATION
3
Query Plan Selection
Core technique
Query (Q)
Query Optimizer
(dynamic programming)
DB catalogs
Minimum Cost
Plan P(Q)
Cost Model
Cost difference between best plan
choice and a sub-optimal choice can
be enormous (orders of magnitude)
August 2005
Picasso (VLDB)
4
Plan and Cost Diagrams
Given a query, the optimizer’s plan choice is a function
(among other factors) of the selectivities of the base
relations participating in the query
– selectivity is the estimated number of rows of a relation
relevant to producing the final result
A plan diagram is a pictorial enumeration of the
plan choices of a database query optimizer over
the relational selectivity space
A cost diagram is a visualization of the associated
(estimated) plan execution costs over the same
relational selectivity space
August 2005
Picasso (VLDB)
5
Example Query [Q7 of TPC-H]
select
supp_nation, cust_nation, l_year, sum(volume) as revenue
from
(select n1.n_name as supp_nation, n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey and
c_custkey = o_custkey and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey and
((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or
(n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) and
l_shipdate between date '1995-01-01' and date '1996-12-31'
and o_totalprice < C1 and c_acctbal < C2 ) as shipping
group by supp_nation, cust_nation, l_year
order by supp_nation, cust_nation, l_year
August 2005
Picasso (VLDB)
6
Example Plan Diagram
August 2005
Picasso (VLDB)
7
Specific Plan Choices
August 2005
Picasso (VLDB)
8
Example Cost Diagram
August 2005
Picasso (VLDB)
9
PICASSO
Picasso
A Java tool that, given a query template, automatically
generates plan and cost diagrams
– Fires queries at user-specified granularity (default 100x100 grid)
– Currently restricted to 2-D plan diagrams and 3-D cost diagrams
Using the tool, enumerated the plan/cost diagrams
produced by industrial-strength query optimizers on
TPC-H-based queries
– IBM DB2 v8, Oracle 9i and Microsoft SQL Server 2000
Plan diagrams appear similar to cubist paintings
[ Pablo Picasso founder of the cubist painting genre ]
August 2005
Picasso (VLDB)
11
Picasso GUI
August 2005
Picasso (VLDB)
12
Testbed Environment
Database
– TPC-H database (1 GB scale) representing a manufacturing
environment, featuring the following relations:
Relation
Query Set
– Queries based on TPC-H
benchmark [Q1 through Q22]
– Uniform 100x100 grid (10000 queries)
[0.5%, 0.5%] to [99.5%, 99.5%]
Relational Engines
– Default installations (with all
optimization features on)
– Stats on all columns, no extra indices
Cardinality
REGION
5
NATION
25
SUPPLIER
CUSTOMER
10000
150000
PART
200000
PARTSUPP
ORDERS
800000
1500000
LINEITEM
6001215
Computational Platform
– Pentium-IV 2.4 GHz, 1GB RAM, Windows XP Professional
August 2005
Picasso (VLDB)
13
RESULTS
Optimizers randomly identified as Opt A, Opt B, Opt C
NOT intended to make comparisons across optimizers
Black-box testing our conclusions are speculative
Full result listing at http://dsl.serc.iisc.ernet.in/projects/PICASSO/
Smooth Plan Diagram [Q7, OptB]
August 2005
Picasso (VLDB)
15
Complex
Plan Diagram [Q8, OptA*]
Highly irregular
plan boundaries
Increases to
Extremely
fine80 plans
with
grained grid
300x300
coverage
!
(P68 ~ 0.02%)
Intricate
Complex
Patterns
August 2005
Picasso (VLDB)
16
Cost Diagram [Q8, OptA*]
August 2005
Picasso (VLDB)
17
Skew in Plan Space Coverage
Opt A
TPC-H
Query
Opt C
Opt B
Plan
80%
Gini
Plan
80%
Gini
Plan
80%
Gini
Cardinality Coverage Index Cardinality Coverage Index Cardinality Coverage Index
2
22
18%
0.76
14
21%
0.72
35
20%
0.77
5
21
19%
0.81
14
21%
0.74
18
17%
0.81
7
13
23%
0.73
6
50%
0.46
19
15%
0.79
8
8
9
9
31
31
63
63
16%
16%
9%
9%
0.81
0.81
0.88
0.88
25
25
44
44
25%
25%
27%
27%
0.72
0.72
0.70
0.70
38
38
41
41
18%
18%
12%
12%
0.79
0.79
0.83
0.83
10
24
16%
0.78
9
22%
0.69
8
25%
0.75
18
18
21
5
5
27
60%
60%
22%
0.33
0.33
0.74
13
13
6
38%
38%
17%
0.57
0.57
0.80
5
5
22
20%
20%
18%
0.75
0.75
0.81
Avg(dense)
28.7
17%
0.79
24.5
23%
0.72
28.8
16%
0.79
Dense Plan Cardinality 10
August 2005
80-20 Rule
Gini index > 0.5
Picasso (VLDB)
18
Cost Domination Principle
Cost of executing any
“foreign” query point in
the first quadrant of qs
is an upper bound on
the cost of executing
the foreign plan at qs
qs
Cost of executing qs
with Plans P4 and P1 is
less than 91 and 90,
respectively.
Cost of Query point
qs with plan P2 is 88
August 2005
Picasso (VLDB)
19
Cost Domination Principle
Dominating Point
Given a pair of distinct points q1 (x1,y1) and q2(x2,y2) in
2-D selectivity space, we say that q2 ≻ q1, if and only if
x2 ≥ x1, y2 ≥ y1 and result cardinality Rq2 ≥ Rq1
Cost Domination Principle
If points q1 (x1,y1) and q2(x2,y2) are associated with distinct
plans P1 and P2 respectively, in the original space, the cost
of executing query q1 with plan P2 is upper-bounded by the
cost of executing q2 with P2, if and only if q2 ≻ q1
August 2005
Picasso (VLDB)
20
Plan Swallowing Algorithm
1.
2.
3.
4.
For each query point qs, look for replacements by
“foreign” query points that are in the first quadrant
relative to qs as the origin.
For the foreign points that are within λ (e.g. λ=10%)
threshold, choose point with lowest cost as potential
replacement.
An entire plan is “swallowed” only if all its query points
can be swallowed by a single plan or group of plans.
Order the plans in ascending order of size; go up the
list, checking for possibility of swallowing each plan.
August 2005
Picasso (VLDB)
21
Complex Plan
[Q8,
OptA*]
Reduced
PlanDiagram
Diagram
(λ=10%)
Reduced
to 7 plans
from 68
Note: Plan Reduction ≠ Change in Optimization Levels
August 2005
Picasso (VLDB)
22
Plan Cardinality Reduction by Swallowing
Opt A
TPC-H
Query
Opt C
%
Avg
Max
%
Avg
Max
%
Avg
Max
Card
Cost
Cost
Card
Cost
Cost
Card
Cost
Cost
Decrease Increase Increase Decrease Increase Increase Decrease Increase Increase
2
59.2
5
67.3
7
46.1
8
8
9
87.6
87.6
84.4
10
67.6
18
40.0
21
59.8
Avg(dense)
67.4
August 2005
Opt B
1.0
1.0
2.6
2.6
0.1
0.1
0.4
0.4
0.4
1.6
1.6
0.8
0.8
0.1
0.1
0.0
0.0
0.9
0.9
4.4
64.2
0.6
0.6
5.9
77.1
8.1
42.9
0.1
0.1
0.6
61.1
9.5
16.6
0.4
0.4
0.7
54.5
9.4
9.4
8.6
84.0
84.0
36.4
0.9
0.9
0.9
1.4
1.4
9.1
9.1
8.9
86.8
86.8
80.5
4.4
44.4
0.5
0.5
6.1
62.5
0.5
46.2
3.7
3.7
9.6
00.0
0.2
66.7
0.9
0.9
2.5
68.2
6.4
56.9
0.7
0.7
6.1
71.4
Picasso (VLDB)
Average Cost
Increase < 2%
3.2
3.2
0.2
0.2
1.1
1.1
1.2
1.2
1.2
2.1
2.1
0.4
0.4
0.0
0.0
0.7
0.7
0.9
0.9
6.4
8.1
9.5
8.4
8.4
8.3
2.4
0.0
6.9
6.4
23
Interesting Patterns
Duplicates and Islands
Plan Switch Points
Footprint Pattern
Speckle Pattern
Duplicates and Islands [Q10, OptA]
Duplicate locations of P3
P18 is an
island within P6
Duplicate
locations of P10
August 2005
Picasso (VLDB)
25
Duplicates and Islands [Q5, OptC]
Three duplicates
of P7, which are
islands within P1
August 2005
Picasso (VLDB)
26
Duplicates and Islands Removal
Databases
# Duplicates
Original Reduced
# Islands
Original
Reduced
Opt A
130
13
38
3
Opt B
80
15
1
0
Opt C
55
7
8
3
With Plan Reduction by Swallowing,
significant decrease in duplicates and islands
August 2005
Picasso (VLDB)
27
Plan Switch Points [Q9, OptA]
Plan Switch Point:
line parallel to axis with a
plan shift for all plans
bordering the line.
Hash-Join sequence
PARTSUPP►◄SUPPLIER►◄PART
is altered to
PARTSUPP►◄PART►◄SUPPLIER
August 2005
Picasso (VLDB)
28
Venetian Blinds [Q9, Opt B]
Six plans simultaneously
change with rapid
alternations to produce a
“Venetian blinds” effect.
Left-deep hash join across
NATION, SUPPLIER and
LINEITEM relations gets
replaced by a right-deep
hash join.
August 2005
Picasso (VLDB)
29
Footprint Pattern [Q7, Opt A]
P7 exhibits a thin broken
curved pattern in the
middle of P2’s region.
P2 has sort-merge-join
at the top of the plan
tree, while P7 has hashjoin
August 2005
Picasso (VLDB)
30
Speckle Pattern [Q17, Opt A*]
An additional sort operation is
present on PART relation in
P2, whose cost is very low
August 2005
Picasso (VLDB)
31
Non-Monotonic Cost Behavior
Plan-Switch Non-Monotonic Costs
Intra-Plan Non-Monotonic Costs
Plan-Switch Non-Monotonic CostsPresence
[Q2,ofOpt
A]
Rules?
Parameterized changes
in search space?
50%
Selectivity
26%
Selectivity
Plan Diagram
August 2005
26%: Cost decreases by a factor of 50
Cost Diagram
50%: Cost increases
by a factor of 70
Picasso (VLDB)
33
Intra-Plan Non-Monotonic Costs [Q21, Opt A]
Plans P1, P3,
P4 and P6
Nested loops join whose cost decreases
with increasing input cardinalities
Cost Diagram
Plan Diagram
August 2005
Picasso (VLDB)
34
Relationship to PQO
PQO (Parametric Query Optimization)
Active research area for last 15 years
Identify the optimal set of plans for the entire relational
selectivity space at compile time
At run time, use actual selectivity values to identify the
appropriate plan choice
Assumptions
– Plan Convexity: If a plan is optimal at two points, then it is
optimal at all points on the straight line joining them
– Plan Uniqueness: An optimal plan appears at only one
contiguous region in the entire space
– Plan Homogeneity: An optimal plan is optimal within the entire
region enclosed by its plan boundaries
August 2005
Picasso (VLDB)
36
Validity of PQO [Q8, Opt A]
Plan uniqueness is violated
by P4
Plan Homogeneity is
Plan
Convexity
is
violated
by P14
severely violated by
regions covered by P12
(dark green) and P16
(light gray).
August 2005
Picasso (VLDB)
37
Note:
PQO is a more viable proposition in the
space of reduced plan diagrams due to
the removal of most duplicates and
islands
August 2005
Picasso (VLDB)
38
Conclusions
Conceived and developed the Picasso tool for
automatically generating plan and cost diagrams
Presented and analyzed representative plan and cost
diagrams on popular commercial query optimizers
– Optimizers make fine grained choices
– Complexity of plan diagrams can be drastically reduced
without materially affecting the query processing quality
– Plan optimality regions can have intricate patterns and
complex boundaries
– Non-Monotonic cost behavior exists where increasing result
cardinalities decrease the estimated cost
– Basic assumptions underlying research literature on PQO do
not hold in practice; but hold approximately for reduced plan
diagrams
August 2005
Picasso (VLDB)
39
Future Work
Extend Picasso to generate higher-dimensional
plan and cost diagrams
Port Picasso to Sybase and PostgreSQL
Conduct a deeper investigation on the query
features that result in high plan cardinalities
Investigate whether it is possible to simplify the
optimizer, so as to be able to directly produce
reduced plan diagrams
Public release of Picasso software (by end-2005)
August 2005
Picasso (VLDB)
40
QUESTIONS ?
END PICASSO
© Copyright 2026 Paperzz