Module 10: Query Optimization 10.1 Outline of Query Optimization

10.1
Module 10: Query Optimization
Web Forms
Applications
I The success of relational database technology is largely due to the systems’
ability to automatically find evaluation plans for declaratively specified queries.
SQL Interface
Module Outline
I Given some (SQL) query Q, the system
SQL Commands
10.1 Outline of Query Optimization
10.2 Motivating Example
10.3 Equivalences in the relational algebra
10.4 Heuristic optimization
10.5 Explosion of search space
10.6 Dynamic
programming
strategy (System R)
Plan Executor
Parser
Operator Evaluator
Optimizer
u
Yo
are
1 parses and analyzes Q,
re!
he
2 derives a relational algebra expression E that computes Q,
3 transforms and simplifies E, and
Query Processor
Transaction
Manager
Outline of Query Optimization
4 annotates the operators in E with access methods and operator algorithms
Files and Index Structures
to obtain an evaluation plan P .
Recovery
Manager
Buffer Manager
Lock
Manager
3 +
4
I Discussed here: Disk Space Manager
Concurrency Control
DBMS
3 is often called “algebraic” (or re-write) query optimization, while
I Task Index Files
Data Files
4 is also called “non-algebraic” (or cost-based) query optimization.
I task System Catalog
Database
292
10.2
Motivating Example
From query to plan . . .
I SQL query Q:
From query to plan
I Example:
List the airports from which flights operated by Swiss (airline code LX) fly to
any German (DE) airport.
Airport
code
’FRA’
’ZRH’
’MUC’
293
:
country
’DE’
’CH’
’DE’
..
.
name
’Frankfurt’
’Zurich’
’Munich’
Flight :
from
’FRA’
’ZRH’
’FRA’
SELECT f .from
FROM Flight f , Airport a
WHERE f .to = a.code AND f .airline = ’LX’ AND a.country = ’DE’
I Relational algebra expression E that computes Q:
to
’ZRH’
’MUC’
’MUC’
..
.
from
airline
’LX’
’LX’
’US’
airline=’LX’∧country =’DE’
SQL query Q:
1
44to=code
444
44
44
Airport
Flight
SELECT f .from
FROM Flight f , Airport a
WHERE f .to = a.code AND f .airline = ’LX’ AND a.country = ’DE’
294
295
10.3
From query to plan . . .
I Relational algebra expression E that computes Q:
I Two relational algebra expressions E1 , E2 are equivalent if—on every legal
database instance—the two expressions generate the same set of tuples.

Note: the order of tuples is irrelevant
from
airline=’LX’∧country =’DE’
1?to=code
 ???

?

Flight
Airport
I Such equivalences are denoted by equivalence rules of the form
E1
≡
E2
(such a rule may be applied by the system in both directions →, ←).
I One (of many) plan(s) P to evaluate Q:
from
Equivalences in the relational algebra
scan
airline=’LX’∧country =’DE’
I We know those equivalence rules from the course “Information Systems”.
scan
FF
NL-1
x1to=code
FF
xx
x
FF
x
x
Airport
Flight
heap scan
index scan on to
297
296
4 i):
I Pictorial description of Some equivalence rules
1 Conjunctive selections can be deconstructed into a sequence of individual selec-
tions:
p ∧p (E)
1
≡
2
p (p (E))
1
2
{×CCC
CC
{{
{
CC
{{
CC
{
{
C
{{
E1
E2
2 Selection operations are commutative:
p (p (E))
1
2
≡
p (p (E))
2
p
1
≡
1pC
{{ CCC
{
CC
{{
CC
{{
C
{{
E1
E2
3 Only the last projection in a sequence of projections is needed, the others can
be omitted:
L (L (· · · L (E) · · · ))
1
2
n
≡
L (E)
1
4 Selections can be combined with Cartesian products and joins:
i)
ii)
p (E1 × E2 )
p (E1 1q E2 )
≡
E1 1 p E2
≡
E1 1p∧q E2
298
299
8 Projection distributes over join as follows:
5 Join operations are commutative:
6
E1 1p E2
L ∪L (E1 1p E2 )
E2 1 p E 1
≡
1
≡
(E1 1p E2 ) 1q∧r E3 )
≡
E1 1 (E2 1 E3 )
L (E1 ) 1p L (E2 )
1
2
9 The set operations union and intersection are commutative:
E1 ∪ E 2
E1 ∩ E 2
ii) Generals joins are associative in the following sense:
E1 1p∧q (E2 1r E3 )
≡
≡
E 2 ∪ E1
E 2 ∩ E1
The set operations union and intersection are associative:
10
where predicate r involves attributes of E2 , E3 only.
7 Selection distributes over joins in the following ways:
(E1 ∪ E2 ) ∪ E3
(E1 ∩ E2 ) ∩ E3
i) If predicate p involves attributes of E1 only:
p (E1 1q E2 )
≡
if p involves attributes in L1 ∪ L2 only and Li contains attributes of Ei only.
i) Natural joins (equality of common attributes) are associative:
(E1 1 E2 ) 1 E3
2
≡
≡
E1 ∪ (E2 ∪ E3 )
E1 ∩ (E2 ∩ E3 )
p (E1 ) 1q E2
≡
ii) If predicate p involves only attributes of E1 and q involves only attributes of
E2 :
p∧q (E1 1r E2 ) ≡ p (E1 ) 1r q (E2 )
7 (a) and 1 ).
(this is a consequence of rules 301
300
The selection operation distributes over ∪, ∩ and \:
11
p (E1 ∪ E2 )
p (E1 ∩ E2 )
p (E1 \ E2 )
Also:
(this does not apply for ∪
p (E1 ) ∪ p (E2 )
p (E1 ) ∩ p (E2 )
p (E1 ) \ p (E2 )
≡
≡
≡
p (E1 ∩ E2 )
p (E1 \ E2 )
10.4
≡
≡
I Query optimizers use the equivalence rules of relational algebra to improve the
expected performance of a given query in most cases.
I The optimization is guided by the following heuristics:
p (E1 ) ∩ E2
p (E1 ) \ E2
(a) Break apart conjunctive selections into a sequence of simpler selections
1 —preparatory step for (b)).
(rule 
)
(b) Move down the query tree for the earliest possible execution
—reduce number of tuples processed).
2, 7 , 11
(rules The projection operation distributes over ∪:
12
L (E1 ∪ E2 )
≡
Heuristic optimization
(c) Replace –× pairs by 1
4 (a)—avoid large intermediate results).
(rule L (E1 ) ∪ L (E2 )
(d) Break apart and move as far down the tree as possible lists of projection
attributes, create new projections where possible
—reduce tuple widths early).
3, 8 , 12
(rules (e) Perform the joins with the smallest expected result first.
302
303
Heuristic optimization: example
SELECT
FROM
WHERE
AND
AND
I SQL query Q:
SELECT
FROM
WHERE
AND
AND
p.ticketno
Flight f , Passenger p, Crew c
f .flightno = p.flightno AND f .flightno = c.flightno
f .date = ’06-23-04’ AND f .to = ’FRA’
p.name = c.name AND c.job = ’Pilot’
p.ticketno
Flight f , Passenger p, Crew c
f .flightno = p.flightno AND f .flightno = c.flightno
f .date = ’06-23-04’ AND f .to = ’FRA’
p.name = c.name AND c.job = ’Pilot’
I Canonical relational algebra expression (reflects the semantics of the SQL SELECTFROM-WHERE block directly):
p.ticketno
(✎ What would be a natural language formulation of Q?)
f .flightno=p.flightno∧f .flightno=c.flightno∧···∧c.job=’Pilot’
×J
tt JJJJ
tt
JJJ
t
tt
J
tt
Crew c
t×JJJ
t
JJ
tt
JJ
tt
J
tt
Flight f
Passenger p
305
304
Heuristic optimization: example
Heuristic optimization: example
1 Break apart conjunctive selection to prepare push-down of selections:
2 Push down selection as far as possible (but no further!):
p.ticketno
p.ticketno
f .flightno=c.flightno
p.name=c.name
f .flightno=p.flightno
f .flightno=c.flightno
f .date=’06-23-04’
×N
ppp NNNNN
p
p
N
p
f .flightno=p.flightno c.job=’Pilot’
f .to=’FRA’
Crew c
p×NNNN
ppp
NN
ppp
f .to=’FRA’ Passenger p
p.name=c.name
c.job=’Pilot’
f .date=’06-23-04’
K
ss× KKKK
ss
KK
s
s
s
K
×
Crew
c
K
s
KKK
sss
KK
sss
Flight f
Passenger p
Flight f
306
307
Heuristic optimization: example
Heuristic optimization: example
3 Re-unite sequences of selections into single conjunctive selections:
4 Introduce projections to reduce tuple widths:
p.ticketno
p.ticketno
f .flightno=c.flightno∧p.name=c.name
f .flightno=c.flightno∧p.name=c.name
mm×QQQQQ
mmm
QQQ
m
m
QQQ
mmm
mm×QQQQQ
mmm
QQQ
m
m
QQQ
mmm
f .flightno=p.flightno
ffff×QQQQQ
fffff
QQQ
f
f
f
f
ff
QQ
fffff
f .to=’FRA’∧f .date=’06-23-04’
Passenger p
f .flightno=p.flightno
c.job=’Pilot’
fff×BB
fffff
BB
f
f
f
f
BB
ff
fffff
BB
f .flightno
BB
BB
BB
Crew c
f .to=’FRA’∧f .date=’06-23-04’
Flight f
Flight f
c.flightno,c.name
c.job=’Pilot’
Crew c
p.ticketno,p.flightno,p.name
Passenger p
309
308
Heuristic optimization: example
Heuristic optimization: example
5 Combine cartesian products and selections into joins:
6 Relation Passenger presumably is the largest relation, re-order the joins (asso6 ii)):
ciativity of general joins, rule p.ticketno
p.ticketno
QQQ
mm1f .flightno=c.flightno∧p.name=c.name
QQQ
mmm
m
m
QQQ
m
m
m
Q
1
c.flightno,c.name
f
.flightno=p.flightno
mmm 777
m
m
m
7
m
77
mmm
77
c.job=’Pilot’
mmm
m
m
77
mm
m
m
m
77
mmm
77
f .flightno
Crew c
77
77
f .to=’FRA’∧f .date=’06-23-04’
Flight f
QQQ
mm1f .flightno=p.flightno∧p.name=c.name
QQQ
mmm
m
m
QQQ
m
m
m
Q
1
p.ticketno,p.flightno,p.name
f
.flightno=c.flightno
mm BBB
m
m
m
B
m
m
BB
mmm
BB
mmm
Passenger p
BB
m
m
m
BB
m
mm
B
m
m
B
mm
p.ticketno,p.flightno,p.name
f .flightno
c.flightno,c.name
f .to=’FRA’∧f .date=’06-23-04’
c.job=’Pilot’
Passenger p
Flight f
310
Crew c
311
Choosing an evaluation plan
10.5
I When the optimizer annotates the resulting algebra expression E it needs to
consider the interaction of the chosen operator algorithms/access methods.
I Consider finding the best join order for the query
I Choosing the cheapest (in terms of I/O) algorithm for each operation independently may not yield overall cheapest plan P .
R1 1 R2 1 R3 1 R4
I Example:
Several join tree shapes (due to associativity, commutativity of 1):
17
777
177
R1
77
R2
1777
7
R4
R3
merge join may be costlier than nested loops join (operands need to be sorted
first), but yields output in sorted order
(good for subsequent duplicate elimination, selection, grouping, . . . )
We need to consider all possible plans and then choose the best one
in a cost-based fashion.

Explosion of search space
...
1?
 ???

?

1
1
///
//
/
//
R1 R2 R3 R4
bushy
...
17
777
7
17
R1
777
7
17
R2
777
R3
R4
left-deep
right-deep
I # of different join orders for an n-way join:
(2n − 2)!
(n − 1)!
(n = 7 : 665 280,
n = 10 : 17 643 225 600)
313
312
Derivation of the number of possible join orderings
Restricting the search space
Let J(n) denote the number of different join orderings for a join of n argument
relations. Obvisouly, J(n) = T (n) · n! . . . with T (n) the number of different binary
tree shapes and n! the number of leaf permutations.1
I Fact: query optimization will not be able to find the overall best plan.
I Instead: optimizers try to avoid the really bad plans (I/O cost of different
plans may differ substantially!)
We can now derive T (n) inductively:
I Restrict the search space: consider left-deep join orders only (left is outer
relation, right is inner):
T (1) = 1,
T (n) =
. . . namely, T (n) =
P
all possibilities
n−1
X
1G
ww GGG
ww
1G
R1
ww GGG
ww
R2
w1GGGG
www
R4
R3
T (i) · T (n − i)
1
T (left subtree) · T (right subtree)
It turns out that T (n) = C(n − 1), for C(n) the n-th Catalan number,
(2n)!
2n
1
C(n) = n+1
n = (n+1)!·n!
Substituting T (n) = C(n − 1), we obtain T (n) · n! =
1 see
(2(n − 1))!
(n − 1)!
Left-deep trees may be evaluated in a fully pipelined fashion
(inner input is stored relation),
intermediate results need not be written to temporary files,
(Block) NL-1 may profit from available indexes on inner relation.
I Number of possible left-deep join orders for n-way join is “only” n!
(Cormen et al., 1990)
314
315
Single relation plans
Cost estimates for single relation plans (System R style)
I Optimizer enumerates (generates) all possible plans to assess their cost.
I IBM System R (≈ 1970s): first successful relational database system, introduced most of the query optimization techniques still in use today.
I If query involves a single relation R only:
Single relation plans:
Consider each available method (e.g., heap scan, (un)clustered index scan)
to access the tuples of a single relation Ri . Keep the access method involving
the least estimated cost.
316
Cost estimates for a single relation plan
I Pragmatic yet successful cost model for access methods on rel. R:
Access method
access primary key index I
2 If
Cost
Height(I) + 1
2.2
if I is B+ tree
if I is hash index
clustered index I
matching predicate p
(kIk + kRk) × sel(p)2
unclustered index I
matching predicate p
(kIk + |R|) × sel(p)
sequential scan
kRk
sel(p) is unknown, assume 1/10.
317
Cost estimates for a single relation plan
0
2 Database maintains unclustered index IB
(kIB0 k = 50) on attribute B:
I Query Q:
SELECT A
FROM R
WHERE B = c
cost = (kIB0 k + |R|) · 1/V (B, R) = (50 + 40 000) · 1/10 = 4 005 pages
3 No index support, use sequential file scan to access R:
I Database profile: kRk = 500, |R| = 40 000, V (B, R) = 10
cost = kRk = 500 pages
|Q| ≈ 1/V (B, R) ·|R| = 1/10 · 40 000 = 4 000 tuples retrieved
| {z }
I To evaluate query Q, use clustered index IB
sel(B=c)
1 Database maintains clustered index IB (kIB k = 50) on attribute B:
cost = (kIB k + kRk) · 1/V (B, R) = (50 + 500) · 1/10 = 55 pages
318
319
1 : example setup
Plan enumeration Plans for multiple relation (join) queries
I We need to make sure not to miss the best left-deep join plan.
I Example query (n = 3):
Degrees of freedom left:
SELECT a.name, f .airline, c.name
FROM Airport a, Flight f , Crew c
WHERE f .to = a.code AND f .flightno = c.flightno
1 For each base relation in the query, consider all access methods.
2 For each join operation, select a join algorithm.
(Airport = A, Flight = F , Crew = C)
I How many possible query plans are left now?
I Assumptions:
✎ Back-of-envelope calculation (query with n relations)
Assume j join algorithms available, i indexes per relation:
≈
#plans
n! · j n−1 · (i + 1)n
Example: with n = 3 relations and j = 3, i = 2:
#plans
≈
2
3
3! · 3 · 3
=
1458
Available join algorithms: hash join, block NL-1, block INL-1
Available indexes:
. clustered B+ tree index I on attribute Flight.to, kIk = 50
kAk = 500, 80 tuples/page
kF k = 1000, 100 tuples/page
kCk = 10
100 F 1 A tuples fit on a page
321
320
2 : candidate plans
Plan enumeration 3 : join algorithm choices
Plan enumeration 17
777
7
I Candidate plan:
1
7
C
7
777
A
F
I Enumerate n! left-deep join trees (3! = 6):
19
999
99
1
9
C
999
99
F
A
19
999
99
9
1
C
999
99
A
F
19
999
99
×
9
F
999
99
C
A
19
999
99
×
9
F
999
99
A
C
19
999
99
1
9
A
999
99
C
F
19
999
99
1
9
A
999
99
F
C
Possible join algorithm choices:
17
777
NL-1
17
C
777
A
F
17
777
C
H-1
1777
7
A
F
17
777
NL-1
17
C
777
A
F
17
777
H-1
17
C
777
A
F
NL-1
H-1
I Prune plans with × (note: no join predicate between A, C) immediately!
I 4 candidate plans remain.
322
NL-1
H-1
Repeat for remaining 3 candidate plans.
323
5 : cost estimation
Plan enumeration 4 : access method choices
Plan enumeration I Candidate plan:
17
777
NL-1
17
C
777
A
F
NL-1
I Estimate cost for candidate plan:
13
333
C
INL-1 13
33
3
NL-1
heap scan
Possible access method choices:
13
333
C
NL-1 13
33
3
NL-1
heap scan
A
F
13
333
C
INL-1 13
33
3
NL-1
heap scan
heap scan
heap scan
A
F
Cost of A 1 F :
|A| · sel(A.code = F.to) · (kF k + kIk)
index scan on F.to
Repeat for remaining candidate plans.
F
index scan
Cost heap scan A: 500 (pages)
heap scan
A
heap scan
=
F.to is key
40 000 · 1/40 000 · (1000 + 50)
kA 1 F k = |A 1 F |/100 = |F |/100 = 100 000/100 = 1 000 (pages)
Cost of (A 1 F ) 1 C: kA 1 F k · kCk = 1 000 · 10 = 10 000
Total estimated cost: 500 + 1 050 + 10 000 = 11 550
325
324
5 : cost estimation
Plan enumeration 5 : cost estimation
Plan enumeration I Current candidate plan:
I Current candidate plan:
13
333
C
NL-1 13
33
3
NL-1
heap scan
A
F
13
333
C heap scan
NL-1 133
33
heap scan A
F heap scan
H-1
heap scan
heap scan
I Remember:
I Remember:
kAk = 500, kF k = 1 000, kCk = 10
kA 1 F k = 1 000
NL-1:
scan left input + scan right input once for each page in left input
Total estimated cost: kAk + kAk · kF k + kA 1 F k · kBk
= 500 + 500 · 1000 + 1000 · 10 = 510 500
kAk = 500, kF k = 1 000, kCk = 10
kA 1 F k = 1 000
NL-1:
scan left input + scan right input once for each page in left input
H-1 (assume 2 passes):
2× (scan both inputs + hash both inputs into buckets) + read hash buckets
with join partners
Total estimated cost: kAk + kAk · kF k + 2 · kA 1 F k + 2 · kBk + k(A 1 F ) 1 Bk
= 500 + 500 · 1000 + 2 · 1000 + 2 · 10 + 10 = 502 530
326
327
5 : cost estimation
Plan enumeration 5 : cost estimation
Plan enumeration I Current candidate plan:
I Current candidate plan:
13
333
C heap scan
33
H-1 1
33
F heap scan
heap scan A
1
H-1
3
333
C heap scan
3
H-1 1
333
F heap scan
heap scan A
NL-1
I Remember:
I Remember:
kAk = 500, kF k = 1 000, kCk = 10
kA 1 F k = 1 000
NL-1:
scan left input + scan right input once for each page in left input
H-1 (assume 2 passes):
2× (scan both inputs + hash both inputs into buckets) + read hash buckets
with join partners
kAk = 500, kF k = 1 000, kCk = 10
kA 1 F k = 1 000
NL-1:
scan left input + scan right input once for each page in left input
H-1 (assume 2 passes):
2× (scan both inputs + hash both inputs into buckets) + read hash buckets
with join partners
Total estimated cost: 2 · (kAk + kF k) + kA 1 F k + 2 · (kA 1 F k + kBk) + kBk
= 2 · (500 + 1000) + 1000 + 2 · (1000 + 10) + 10 = 6 030
Total estimated cost: 2 · (kAk + kF k) + kA 1 F k + kA 1 F k · kBk
= 2 · (500 + 1000) + 1000 + 1000 · 10 = 14 000
329
328
Repeated enumeration of identical sub-plans

I The plan enumeration reconsiders the same sub-plans over and over again.
13
333
NL-1 13
C
333
scan
A
F
H-1
1
33
A
F
H-1
scan
scan
33
C
33
1
33
scan
scan
scan
H-1
1
33
A
F
33
NL-1 13
C
333
scan
NL-1
INL-1
scan
1
33
scan
33
C
33
A
1
33
F
scan
scan
index
13
333
H-1 1
33
C
33
NL-1
scan
A
F
H-1
1
33
A
F
INL-1
scan
1 Pass 1 (all 1-relation plans):
Find best 1-relation plans for each relation
(i.e., select access method)
scan
2 Pass 2 (all 2-relation plans):
scan
33
C
33
1
33
Dynamic programming strategy (System R)
I Divide plan enumeration into n passes (for a query with n joined relations):
I Cost and result size of sub-plan indepedent of larger embedding plan:
NL-1
10.6
Find best way to join plans of Pass 1 to another relation
(generate left-deep trees: sub-plans of Pass 1 appear as outer in join)
..
.
scan
3 Pass n (all n-relation plans):
Find best way to join plans of Pass n − 1 to the nth relation
(sub-plans of Pass n − 1 appear as outer in join)
index

A k − 1 relation sub-plan P is not combined with a kth relation R unless there
is a join condition between the relations in P and R or all join conditions already
present in P (avoid × if possible).
! Idea:
Remember already considered sub-plans in memoization data
structure. Resulting approach known as dynamic programming.
330
331
Plan enumeration: pruning, interesting orders
System R style plan enumeration
I For each sub-plan obtained this way, remember cost and result size estimates!
I Example query:
I Pruning:
SELECT a.name, f .airline, c.name
FROM Airport a, Flight f , Crew c
WHERE f .to = a.code AND f .flightno = c.flightno
For each subset of relations joined, keep only
cheapest sub-plan overall
+
cheapest sub-plans that generate an intermediate result with an interesting
order of tuples.
I Now assume:
I Interesting order determined by
Available join algorithms: merge-1, block NL-1, block INL-1
Available indexes:
. clustered B+ tree index I on A.code, height(I) = 3, kIkleaf = 500
presence of SQL ORDER BY clause in the query
kAk = 10 000, 5 tuples/page
presence of SQL GROUP BY clause in the query
kF k = 10, 10 tuples/page
kCk = 10, 20 tuples/page
join attributes of subsequent equi-joins (prepare for merge-1).
10 F 1 A tuples fit on a page, 10 F 1 C tuples fit on a page
332
333
System R: Pass 1 (1-relation plans)
System R: Pass 2 (2-relation plans)
I Access methods for A:
I Start with 1-relation plan to access A as outer:
1 heap scan
cost = kAk = 10 000
Heap scan of A as outer:
Index scan of A as outer:
2 index scan on A.code, index I
cost = kIk + kAk = 500 + 10 000 = 10 500
1 and 2 since 2 has interesting order on attribute to
Keep which is a join attribute.

I Access method for F :
13
333
A
F
1 ? = NL-1
cost = 10 000 + 10 000 · kF k = 10 000 + 10 000 · 10 = 110 000
2 ? = M-1 (assume 2-way sort/merge):
cost = 10 000 + 2 · 10 000 + 2 · kF k + kF k = 30 030
?
3 ? = NL-1
cost = 10 500 + 10 000 · kF k = 10 500 + 10 000 · 10 = 110 500
4 ? = M-1 (assume 2-way sort/merge):
cost = 10 500 + 2 · kF k + kF k = 10 530
1 heap scan
cost = kF k = 10
I Access method for C:
4 only (N.B. uses interesting order in non-optimal sub-plan!)
I Keep 1 heap scan
cost = kCk = 10
334
335
System R: Pass 2 (cont’d)
System R: Pass 2 (cont’d)
I Start with F as outer:
?
1
33
F
33
A/C
I Start with C as outer:
13
333
C
F
?
?
A as inner:
1 ? = NL-1, heap scan A
cost = kF k + kF k · kAk = 100 100
2 ? = INL-1, index scan A
Keep!
cost = kF k + |F | · (height(I) + 1) = 10 + 100 · (3 + 1) = 410
3 ? = M-1, heap scan A
cost = kF k + kAk + 2 · (kF k + kAk) = 30 300
4 ? = M-1, index scan A
cost = kF k + 2 · kF k + 10 500 = 10 530
C as inner:
5 ? = NL-1
cost = kF k + kF k · kCk = 10 + 10 · 10 = 110
6 ? = M-1
Keep!
cost = kF k + kCk + 2 · (kF k + kCk) = 10 + 10 + 2 · (10 + 10) = 60
1 ? = NL-1
cost = kCk + kCk · kF k = 10 + 10 · 10 = 110
2 ? = M-1
Keep!
cost = kCk + kF k + 2 · (kCk + kF k) = 10 + 10 + 2 · (10 + 10) = 60
I N.B.
C 1 A not enumerated because of cross product (×) avoidance.
337
336
System R: further pruning of 2-relation plans
System R: Pass 3 (3-relation plans)
I A 1 F:
I Best (A 1 F ) sub-plan:
13
333
A
F
M-1
1 index
scan
cost = 10 530, order on to
I C 1 F:
13
333
C
F
M-1
3 scan
scan
cost = 60, order on flightno
13
333
F
A
INL-1
2 scan
cost = 410, no order, kA 1 F k = 10
1
cost = 410, no order
13
333
F
C
M-1
4 scan
1
3
333
INL-1 133
C
33
scan F
A index
NL-1
index
cost = 410 + kA 1 F k · kCk = 410 + 10 · 10 = 510
13
333
INL-1 133
C
33
scan F
A index
M-1
scan
2
cost = 60, order on flightno
cost = 410 + kCk + 2 · (kA 1 F k + kCk) = 410 + 10 + 2 · (10 + 10)
= 460
2 and 3 or 4 (order in 1 not interesting for subsequent join(s)).
I Keep 338
339
System R: And the winner is . . .
System R: Pass 3 (cont’d)
cost = 60, order on flightno, kC 1 F k = 10, |C 1 F | = 100
1
3
333
M-1 1
333 A scan
3
scan F
C scan
NL-1
1
3
333
M-1 1
333 A index
3
scan F
C scan
INL-1
cost = 100 060
cost = 60 + 100 · 4 = 460
13
333
33
A scan
M-1 1
33
scan F
C scan
13
333
M-1 1
33
A index
33
scan F
C scan
cost = 30 080
cost = 10 580
M-1
1
3
333
M-1 1
333 A index
3
scan F
C scan
INL-1
I Best (C 1 F ) sub-plan:
cost = 460
I Observations:
M-1
Best plan mixes join algorithms and exploits indexes.
Worst plan had cost > 100 000
(exact cost unknown due to pruning).
340
Bibliography
Astrahan, M. M., Schkolnick, M., and Kim, W. (1980). Performance of the System R
access path selection mechanism. In IFIP Congress, pages 487–491.
Chamberlin, D., Astrahan, M., Blasgen, M., Gray, J., King, W., Lindsay, B., Lorie, R.,
Mehl, J., Price, T., Putzolu, F., Selinger, P., Schkolnick, M., Shultz, D., Traiger, I.,
Wade, B., and Yost, R. (1981). History and evaluation of System/R. Communications
of the ACM, 24(10):623–646.
Cormen, T. T., Leiserson, C. E., and Rivest, R. L. (1990). Introduction to algorithms.
MIT Press.
Jarke, M. and Koch, J. (1984). Query optimization in database systems. ACM Computing
Surveys, 16(2):111–152.
Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill,
New York, 3 edition.
W. Kim, D.S. Reiner, D. B., editor (1985). Query Processing in Database Systems.
Springer-Verlag.
342
Optimization yielded ≈ 1000-fold improvement over worst plan!
341