CS4432: Database Systems II

CS4432: Database Systems II
Query Rewrite
1
Query Re-Writing
Query in SQL

Query Plan in Algebra (logical)

Other Query Plans in Algebra (logical)
How does the optimizer generate
equivalent query plans?
2
Example
Select B, D From R, S Where R.A = “c” And S.E = 2 And R.C=S.C
Plan 1
Plan 2
Plan 3
3
Relational Algebra Optimization
• Set of rules to apply called “Transformation rules” or
“Algebraic Laws”
• What are transformation rules ?
– Preserve equivalence of plans
• That is, must produce the same answer
• What are good transformations?
– Reduce query execution costs
4
Relational Operators (revisited)
• Selection Basics
– Idempotent
– Commutative
• Selection Conjunctions
– Useful when pruning
• Selection Disjunctions
– Equivalent to UNIONS
Rules: Selection and Binary Operators
• Must push selection to both arguments:
–  C (R U S) =  C (R) U  C (S)
• Must push to first arg, optional for 2nd:
–  C (R - S) =  C (R) - S
–  C (R - S) =  C (R) -  C (S)
• Push to at least one arg with all attributes mentioned in C:
– product, natural join, theta join, intersection
– e.g.,  C (R X S) =  C (R) X S, if R has all the attributes in C
Rules: Natural Join Rewriting
R
(R
S = S
R
S)
T =R
(S
T)
Can also write as trees, e.g.:
T R
R
S
S
T
7
Rules: Select

p1p2(R) =
 [
p1
p2
Conjunction predicates
(R)]
disjunction predicates
p1v p2(R) = [ p1 (R)] U [ p2 (R)]
8
Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
What about union R U S = ?
• Option 1  Sum the occurrences
R U S = {a,a,b,b,b,b,b,c,c,c,d}
• Option 2  Max of occurrences
R U S = {a,a,b,b,b,c,c,d}
CS 4432
logical query rewriting - lecture 15
9
Bags vs. Sets
Which option makes this rule work ?
p1 v p2 (R) = p1 (R) U p2(R)
Example: R={a,a,b,b,b,c}
P1 satisfied by a,b; P2 satisfied by b,c
Let us try MAX():
p1v p2 (R) = {a,a,b,b,b,c}
p1(R) = {a,a,b,b,b}
Matching
p2(R) = {b,b,b,c}
p1(R) U p2 (R) = {a,a,b,b,b,c}
CS 4432
logical query rewriting - lecture 15
10
Bags vs. Sets
Which option makes this rule work ?
p1 v p2 (R) = p1 (R) U p2(R)
Example: R={a,a,b,b,b,c}
P1 satisfied by a,b; P2 satisfied by b,c
Let us try SUM():

CS 4432
p1vp2 (R)
= {a,a,b,b,b,c}



p1(R)
= {a,a,b,b,b}
p2(R)
= {b,b,b,c}
p1(R)
U

p2
Not Matching
(R) = {a,a,b,b,b,b,b,b,c}
logical query rewriting - lecture 15
11
Bag Semantics in DBMSs
• Usually the “SUM” option for bag union is more meaningful
• Many DBMSs implement this semantics
 Great care must be taken, as some rules cannot be used for
bags !
CS 4432
logical query rewriting - lecture 15
12
Rules: Project
Let:
X = set of attributes
Y = set of attributes
XY = X U Y
pxy (R) = px [py (R)]
13
Rules:  +
Combined
Let p = predicate with only R attributes
q = predicate with only S attributes
m = predicate with both R and S attributes
 (R
 (R
 (R)
p
S) =
q
S) = R
m (R
S
p
q(S)
S) = No change
Always a good idea to
push selection down
Join Predicates
14
Rules:  +
Combined
Let p = predicate with only R attributes
q = predicate with only S attributes
m = predicate with both R and S attributes
Rule can be derived !
15
Rules:  +
Combined
Let p = predicate with only R attributes
q = predicate with only S attributes
m = predicate with both R and S attributes
What about these ones??
16
Rules:  + p combined
Let x = subset of R attributes
z = attributes in predicate P (subset of R attributes)
Must ensure z
attributes are projected
Usually not that effective Unless…
R contains really large attributes that we want to
avoid reading
17
Rules:
+ p combined
Let x = subset of R attributes
y = subset of S attributes
z = intersection of R,S attributes (Join columns)
18
Sometimes It’s Tricky
• Suppose we have relations
– StarsIn(title,year,starName)
– Movie(title,year,len,inColor,studioName)
• and a view
– CREATE VIEW MoviesOf1996 AS
SELECT *
FROM Movie
WHERE year = 1996;
• and the query
– SELECT starName, studioName
FROM MoviesOf1996 NATURAL JOIN StarsIn;
19
An Improved Logical Query Plan
Summary of Query Rewrite
• Transformation rules to create equivalent query plans
• Check textbook for more rules
• Always select-push-down is good
• Sometimes project-push-down is good
Both reduce the size
as early as possible
• Pushing selection all the way  enables using indexes
• Order among join relations
– Affects which one is outer or inner
21