Overview of Query Evaluation

Examples of Physical Query Plan
Alternatives
Selections from Chapters 12, 14, 15
1
Query Optimization

NOTE: Relational query languages provide a wide variety
of ways in which a user can express.

HENCE: system has many options for evaluating a query.

Optimizer is important for query performance.
 Generates alternative plans
 Choose plan with least estimated cost.


Ideally, find best plan.
Realistically, consistently find a quite good one.
2
A Query (Evaluation) Plan
An extended relational algebra tree
Annotations at each node indicate:
 access methods to use for each table.
 implementation methods used for each relational operator.
(On-the-fly)
sname
sname
bid=100
rating > 5
bid=100
rating > 5
(On-the-fly)
(Simple Nested Loops)
sid=sid
sid=sid
Reserves
Sailors
Reserves
Sailors
3
Query Optimization

Multi-operator Queries: Pipelined Evaluation
• On-the-fly: The result of one operator is pipelined to
another operator without creating a temporary table to
hold intermediate result, called on-the-fly.
• Materialized : Otherwise, intermediate results must be
materialized.
C
A
B
4
Alternative Plans: Schema Examples
Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

Reserves:




Each tuple is 40 bytes long,
100 tuples per page,
1000 pages.
Sailors:



Each tuple is 50 bytes long,
80 tuples per page,
500 pages.
5
Alternative Plans: Motivating Example
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND
RA Tree:
S.rating>5
sname
bid=100
rating > 5
sid=sid
Reserves
Sailors
RA Tree:
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5
sname
bid=100
rating > 5
sid=sid
Costs :
Sailors
Reserves
1. Scan Sailors :
(On-the-fly)
 For each page of Sailors, scan Reserves
Plan: sname
 500+500*1000 I/Os
 Or,
rating > 5
(On-the-fly)
bid=100
2. Scan Reserves
 For each page of Reserves, scan Sailors
 1000+1000 * 500 I/Os
(Simple Nested Loops)
sid=sid
Reserves
Sailors
Alternative Plans:
Motivating Example
RA Tree:
sname
bid=100
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5
sid=sid
Reserves



Cost: 500+500*1000 I/Os
Almost the worst plan!
Reasons :
Plan:

Goal of optimization: To find
more efficient plans that compute
the same answer.
Reserves
Sailors
(On-the-fly)
sname
bid=100
 selections could be `pushed’ earlier,
 no use made of indexes
rating > 5
rating > 5
(On-the-fly)
(Simple Nested Loops)
sid=sid
Sailors
(On-the-fly)
Alternative Plans 1
(No Indexes)


Main difference: push selects.
Reduce size of table to be joined
With 5 buffers, cost of plan:





sname
(Sort-Merge Join)
sid=sid
(Scan;
write to bid=100
temp T1)
Reserves
rating > 5
Sailors
Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform
distribution).
Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).
Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)
Total: 4060 page I/Os.
Optimization1: block nested loops join:
 join cost = 10+4*250, total cost = 2770.

(Scan;
write to
temp T2)
Optimization2: `push’ projections:
 T1 has only sid, T2 only sid and sname:
 T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.
Alternative Plan : Using Index ?
Push Selections Down ?
What indices help here?




Index on Reserves.bid?
Index on Sailors.sid?
Index on Reserves.bid?
Index on Sailors.rating?
sname
sid=sid
bid=100
rating > 5
Reserves
Sailors
Example Plan : With Index

With index on Reserves.bid :
Assume 100 different bid values.
Assume 100,000 tuples.
Assume 100 tuples/disk page
sname
(On-the-fly)
rating > 5 (On-the-fly)



We get 100,000/100 = 1000 tuples
On 1000/100 = 10 disk pages.
If index clustered,
Cost = 10 I/Os.
sid=sid
(Use hash
index; do
not write
result to
temp)
bid=100
Reserves
(Index Nested Loops,
with pipelining )
Sailors
Example Plan Continued
•
Index on Sailors.sid :
- Join column sid is key for Sailors.
- At most one matching tuple,
unclustered on sid OK.
sname
rating > 5 (On-the-fly)
sid=sid
•
Cost?
- For each Reserves tuples (1000):
get matching Sailors tuple (1.2 I/O);
so total 1210 I/Os.
(On-the-fly)
(Use hash
index; do
not write
result to
temp)
bid=100
Reserves
(Index Nested Loops,
with pipelining )
Sailors
Alternative Plan : With Second Index
•Selection
•
Pushing down?
sname
(On-the-fly)
Push (rating>5) before join ?
rating > 5 (On-the-fly)
•
Answer:
- No, because of availability of sid
index on Sailors.
•
Reason :
-No index on selection result.
- Then selection requires scan Sailors.
sid=sid
(Use hash
index; do
not write
result to
temp)
bid=100
Reserves
(Index Nested Loops,
with pipelining )
Sailors
Summary
A query is evaluated by converting it to a tree of
operators and evaluating the operators in the tree.
 There are several alternative evaluation
algorithms for each relational operator.
 Query evaluation must compare alternative plans
based on their estimated costs


Must understand query optimization in order to
fully understand the performance impact of a
given database design on a query workload
15