On the Optimal Ordering of Maps and Selections under Factorization

On the Optimal Ordering of Maps and
Selections under Factorization
ICDE 2005
Thomas Neumann
Sven Helmer
Guido Moerkotte
Universität Mannheim
[tneumann|helmer|moerkotte]@informatik.uni-mannheim.de
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 1/22
Introduction
Introduction
● Introduction
n
● Example Query
● Optimizing Queries
Overview
Formalization
Complexity
n
n
User-defined functions (UDFs) can be found in all
major commercial DBMSs
Enhances functionality of DBMS
Allows modularization of code
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 2/22
Example Query
Introduction
● Introduction
● Example Query
● Optimizing Queries
Overview
Formalization
Complexity
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
SELECT p.name
FROM pictures p
WHERE coarseness(texturediff(p.image, q)) < 1.5
AND
contrast(texturediff(p.image, q)) < 0.3
AND
red(colordiff(p.image, q)) < 0.1
AND
green(colordiff(p.image, q)) < 0.4
AND
blue(colordiff(p.image, q)) < 0.2
AND
containscircle(shapediff(p.image, q)) > 0.8;
On the Optimal Ordering of Maps and Selections under Factorization - p. 3/22
Optimizing Queries
Introduction
● Introduction
n
● Example Query
● Optimizing Queries
Overview
Formalization
Complexity
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
Two challenges for optimizer when handling UDFs
in query predicates
u Vast differences in evaluation costs
u Eliminating common subexpressions
(factorization)
Ordering UDF calls properly is important for query
evaluation costs
On the Optimal Ordering of Maps and Selections under Factorization - p. 4/22
Overview
Introduction
n
Overview
● Overview
n
Formalization
Complexity
Algorithms
n
n
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
Formalization
Complexity
Algorithms
Evaluation
Conclusion
On the Optimal Ordering of Maps and Selections under Factorization - p. 5/22
Operators
Introduction
n
Overview
Formalization
● Operators
● UDFs
● Goal
Complexity
Algorithms
Evaluation
Two operators are important in this context:
u Selection operator:
σ p (R) := {t|p(t)}
u Map operator:
χa:u (R) := {t ◦ [a : v]|t ∈ R, v = u(t)}
Conclusion & Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 6/22
UDFs
Introduction
n
Overview
Formalization
● Operators
● UDFs
● Goal
Complexity
Algorithms
n
Map adds an attribute containing the result of the
UDF call to each tuple
For our example:
σx<1.5
σy<0.3
σr<0.1
σg<0.4
σb<0.2
σc>0.8
χy:contrast
χr:red
χg:green
χb:blue
χc:circle
Evaluation
Conclusion & Future Work
χx:coarse
χtd:texture
Sven Helmer, April 6th 2005
χcd:color
χsd:shape
On the Optimal Ordering of Maps and Selections under Factorization - p. 7/22
Goal
Introduction
n
Overview
Formalization
● Operators
● UDFs
● Goal
n
Complexity
Algorithms
n
Evaluation
Conclusion & Future Work
n
Sven Helmer, April 6th 2005
Find an optimal order for evaluating the predicates
considering factorization
Boils down to finding permutation of selections
with optimal costs (maps are ordered implicitly)
Easy when no factorization is involved: just rank
selections r = s−1
c
Doesn’t work for factorization (see example query,
details in paper):
u Evaluation cost with ranking: 10002.43
u Optimal cost: 6111.36
On the Optimal Ordering of Maps and Selections under Factorization - p. 8/22
Motivation
Introduction
n
Overview
Formalization
Complexity
● Motivation
● Sketch of Proof
● Sketch of Proof(2)
● Sketch of Proof(3)
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
We tried to develop an efficient algorithm to find
optimal order under factorization
This motivated us to have a closer look at the
problem itself:
u Ordering selections and maps optimally under
factorization is NP-hard!
u Here: brief sketch of proof
On the Optimal Ordering of Maps and Selections under Factorization - p. 9/22
Sketch of Proof
Introduction
n
Overview
Formalization
Complexity
● Motivation
n
● Sketch of Proof
● Sketch of Proof(2)
n
● Sketch of Proof(3)
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
We can reduce the clique problem (“Is there a
clique of size k in a graph G?”) to our problem
Every node in G is mapped to a selection operator
Every edge in G is represented by a map operator
that is shared by the associated selections
All selections that do not have at least d map
operators are assigned one additional map
operator
On the Optimal Ordering of Maps and Selections under Factorization - p. 10/22
Sketch of Proof(2)
Introduction
1
2
4
3
Overview
Formalization
Complexity
● Motivation
● Sketch of Proof
● Sketch of Proof(2)
● Sketch of Proof(3)
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
σ1
σ2
σ3
σ4
χ12
χ13
χ23
χ14 χ4
On the Optimal Ordering of Maps and Selections under Factorization - p. 11/22
Sketch of Proof(3)
Introduction
n
Overview
Formalization
Complexity
● Motivation
● Sketch of Proof
● Sketch of Proof(2)
● Sketch of Proof(3)
n
Algorithms
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
Crucial point: costs and selectivities are assigned
in a certain way
u If there is a clique in G, its nodes will appear as
a prefix of a minimal cost solution
Transform G into χ -σ -problem
Check first k elements of minimal cost solution for
k-clique in G
On the Optimal Ordering of Maps and Selections under Factorization - p. 12/22
Algorithms
Introduction
n
Overview
Formalization
Complexity
n
Algorithms
● Algorithms
● Generating Permutations
● Memoization
● Pruning
● Exploiting Connected
Components
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
n
Find optimal order under factorization without
generating all permutations of selections
Two basic algorithms
u Generating permutations
u Memoization
Enhanced by two (orthogonal) techniques
u Pruning
u Exploiting connected components
On the Optimal Ordering of Maps and Selections under Factorization - p. 13/22
Generating Permutations
Introduction
Overview
Formalization
Complexity
Algorithms
● Algorithms
● Generating Permutations
● Memoization
● Pruning
● Exploiting Connected
Components
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
// Input: S = {σ1 , . . . , σn }
// Output: optimal sequence
perm(S) {
if(|S| > 0)
for(each σi in S)
Ci = σi ◦ perm(S \ σi );
return Ci with smallest costs;
else
return empty sequence;
}
On the Optimal Ordering of Maps and Selections under Factorization - p. 14/22
Memoization
Introduction
Overview
Formalization
Complexity
Algorithms
● Algorithms
● Generating Permutations
● Memoization
● Pruning
● Exploiting Connected
Components
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
// Input: set of already applied map operators Y , S = {σ 1 , . . . , σn }
// Output: optimal sequence
memo(Y, S) {
if(|S| > 0)
if(lookup of Y, S in hash table is successful)
return hash table entry with optimized plan;
else
for(each σi in S)
Ci = σi ◦ memo(Xσi ∪Y, S \ σi );
store Ci with smallest cost in hash table under entry for Y, S;
return Ci with smallest costs;
else
return empty sequence;
}
On the Optimal Ordering of Maps and Selections under Factorization - p. 15/22
Pruning
Introduction
n
Overview
Formalization
Complexity
Algorithms
● Algorithms
● Generating Permutations
● Memoization
● Pruning
● Exploiting Connected
Components
n
The costs of each prefix of a sequence with
minimal costs are also minimal
When we have two alternatives
u p ◦ σ i ◦ σk
u p ◦ σ k ◦ σi
we only use the cheaper one and discard the other
Evaluation
Conclusion & Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 16/22
Exploiting Connected Components
σx<1.5
Introduction
σy<0.3
σr<0.1
σg<0.4
σb<0.2
σc>0.8
χy:contrast
χr:red
χg:green
χb:blue
χc:circle
Overview
Formalization
χx:coarse
Complexity
χtd:texture
Algorithms
● Algorithms
χcd:color
χsd:shape
● Generating Permutations
● Memoization
● Pruning
● Exploiting Connected
Components
n
Evaluation
Conclusion & Future Work
n
n
Sven Helmer, April 6th 2005
Each connected component is brought into an
optimal order
Merging the (normalized) components can be
done with a ranking function
This technique has also been applied for join
ordering
On the Optimal Ordering of Maps and Selections under Factorization - p. 17/22
Evaluation
Introduction
n
Overview
Formalization
n
Complexity
Algorithms
Evaluation
● Evaluation
● Measurements
n
All variants were implemented
We measured performance in terms of (recursive)
calls
We are mainly interested in the deterioration of
our algorithms with increasing factorization
● Evaluation results
Conclusion & Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 18/22
Measurements
5 selections
Introduction
Overview
Formalization
Complexity
100
Algorithms
● Evaluation
● Measurements
● Evaluation results
calls
Evaluation
Conclusion & Future Work
perm
perm-p
perm-pc
memo
memo-c
10
0
Sven Helmer, April 6th 2005
2
4
6
no of maps
8
10
On the Optimal Ordering of Maps and Selections under Factorization - p. 19/22
Evaluation results
Introduction
n
Overview
Formalization
Complexity
Algorithms
Evaluation
● Evaluation
● Measurements
n
For practical cases we do much better than
generating all permutations
Usually, memoization is better than generating
permutations
u This is mainly a trade-off
● Evaluation results
Conclusion & Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 20/22
Conclusion
Introduction
n
Overview
Formalization
Complexity
Algorithms
n
Evaluation
Conclusion & Future Work
● Conclusion
● Future Work
Sven Helmer, April 6th 2005
n
We investigated the problem of optimally ordering
selections and map operators under factorization
for the first time
We have shown NP-hardness
There are better algorithms for computing optimal
ordering than generating all permutations
On the Optimal Ordering of Maps and Selections under Factorization - p. 21/22
Future Work
Introduction
n
Overview
Formalization
Complexity
n
Algorithms
Evaluation
n
Not shown here: ordering joins, selections, and
maps optimally
SIGMOD 2005
Maybe VLDB 2005
Conclusion & Future Work
● Conclusion
● Future Work
Sven Helmer, April 6th 2005
On the Optimal Ordering of Maps and Selections under Factorization - p. 22/22