Chapter 15 Algorithms for Query Processing and Optimization ICS

Chapter 15
Algorithms for Query Processing
and Optimization
ICS 424 Advanced Database Systems
Dr. Muhammad Shafique
ICS 424 - 01 (072)
Query Processing and
Optimization
1
Outline
•
•
•
•
•
•
Introduction
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Overview of query optimization in Oracle
ICS 424 - 01 (072)
Query Processing and
Optimization
2
Material Covered from Chapter 15
•
•
•
•
•
•
Pages 537, 538, 539
Section 15.1
Section 15.2
Section 15.6
Section 15.7
Section 15.9
ICS 424 - 01 (072)
Query Processing and
Optimization
3
Introduction to Query Processing
• Query optimization
• The process of choosing a suitable execution strategy
for processing a query.
• Two internal representations of a query:
• Query Tree
• Query Graph
ICS 424 - 01 (072)
Query Processing and
Optimization
4
Background Review
• DDL compiler
• DML compiler
• Runtime
database
processor
• System catalog
ICS 424 - 01 (072)
Query Processing and
Optimization
5
Processing a Query
• Tasks in processing a high-level query
1. Scanner scans the query and identifies the language tokens
2. Parser checks syntax of the query
3. The query is validated by checking that all attribute names and
relation names are valid
4. An intermediate internal representation for the query is created
(query tree or query graph)
5. Query execution strategy is developed
6. Query optimizer produces an execution plan
7. Code generator generates the object code
8. Runtime database processor executes the code
• Query processing and query optimization
ICS 424 - 01 (072)
Query Processing and
Optimization
6
Processing a Query
• Typical steps in processing a high-level query
1.
2.
3.
4.
5.
6.
7.
8.
9.
Query in a high-level query language like SQL
Scanning, parsing, and validation
Intermediate-form of query like query tree
Query optimizer
Execution plan
Query code generator
Object-code for the query
Run-time database processor
Results of query
ICS 424 - 01 (072)
Query Processing and
Optimization
7
ICS 424 - 01 (072)
Query Processing and
Optimization
8
SQL Queries and Relational Algebra
• SQL query is translated into an equivalent extended
relational algebra expression --- represented as a query tree
• In order to transform a given query into a query tree, the
query is decomposed into query blocks
• Query block:
• The basic unit that can be translated into the algebraic operators and
optimized.
• A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are
part of the block.
• The query optimizer chooses an execution plan for each
block
ICS 424 - 01 (072)
Query Processing and
Optimization
9
COMPANY Relational Database Schema (1)
ICS 424 - 01 (072)
Query Processing and
Optimization
10
COMPANY Relational Database Schema (2)
ICS 424 - 01 (072)
Query Processing and
Optimization
11
SQL Queries and Relational Algebra (1)
• Example
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno = 5
)
• Inner block and outer block
ICS 424 - 01 (072)
Query Processing and
Optimization
12
Translating SQL Queries into Relational Algebra
SELECT
FROM
WHERE
SELECT
FROM
WHERE
LNAME, FNAME
EMPLOYEE
SALARY > (
SELECT
FROM
WHERE
LNAME, FNAME
EMPLOYEE
SALARY > C
πLNAME, FNAME (σSALARY>C(EMPLOYEE))
ICS 424 - 01 (072)
SELECT
FROM
WHERE
MAX (SALARY)
EMPLOYEE
DNO = 5);
MAX (SALARY)
EMPLOYEE
DNO = 5
ℱMAX SALARY (σDNO=5 (EMPLOYEE))
Query Processing and
Optimization
13
SQL Queries and Relational Algebra (2)
• Uncorrelated nested queries Vs Correlated nested queries
• Example
Retrieve the name of each employee who works on all the projects
controlled by department number 5.
SELECT
FROM
WHERE (
ICS 424 - 01 (072)
FNAME, LNAME
EMPLOYEE
(SELECT
PNO
FROM
WORKS_ON
WHERE
SSN=ESSN)
CONTAINS
(SELECT
PNUMBER
FROM
PROJECT
WHERE
DNUM=5) )
Query Processing and
Optimization
14
SQL Queries and Relational Algebra (3)
• Example
For every project located in ‘Stafford’, retrieve the project number,
the controlling department number and the department manager’s
last name, address and birthdate.
• SQL query:
SELECT
FROM
WHERE
P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
• Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER
ICS 424 - 01 (072)
(DEPARTMENT))
Query Processing and
Optimization
MGRSSN=SSN
(EMPLOYEE))
15
SQL Queries and Relational Algebra (4)
ICS 424 - 01 (072)
Query Processing and
Optimization
16
Implementing Basic Query Operations
• An RDBMS must provide implementation(s) for
all the required operations including relational
operators and more
• External sorting
• Sort-merge strategy
• Sorting phase
• Number of file blocks (b)
• Number of available buffers (nB)
• Runs --- (b / nB)
• Merging phase --- passes
• Degree of merging --- the number of runs that are merged
together in each pass
ICS 424 - 01 (072)
Query Processing and
Optimization
17
Algorithms for External Sorting (1)
• External sorting:
• Refers to sorting algorithms that are suitable for large files
of records stored on disk that do not fit entirely in main
memory, such as most database files.
• Sort-Merge strategy:
• Starts by sorting small subfiles (runs) of the main file and
then merges the sorted runs, creating larger sorted subfiles
that are merged in turn.
ICS 424 - 01 (072)
Query Processing and
Optimization
18
Algorithms
for
External
Sorting (2)
ICS 424 - 01 (072)
Query Processing and
Optimization
19
Algorithms for External Sorting (3)
• Analysis
Number of file blocks = b
Number of initial runs = nR
Available buffer space = nB
Sorting phase: nR = (b/nB)
Degree of merging:
dM = Min (nB-1, nR);
Number of passes:
nP = (logdM(nR))
Number of block accesses: (2 * b) + (2 * b * (logdM(nR)))
• Example done in the class
ICS 424 - 01 (072)
Query Processing and
Optimization
20
Implementing Basic Query Operations (cont.)
•
Estimates of selectivity
•
•
Selectivity is the ratio of the number of tuples that satisfy the
condition to the total number of tuples in the relation.
SELECT (  ) operator implementation
1.
2.
3.
4.
5.
6.
7.
8.
9.
Linear search
Binary search
Using a primary index (or hash key)
Using primary index to retrieve multiple records
Using clustering index to retrieve multiple records
Using a secondary index on an equality comparison
Conjunctive selection using an individual index
Conjunctive selection using a composite index
Conjunctive selection by intersection of record pointers
ICS 424 - 01 (072)
Query Processing and
Optimization
21
Implementing Basic Query Operations (cont.)
• JOIN operator implementation
1. Nested-loop join
2. Sort-merge join
3. Hash join
• Partition Hash join
• Hybrid hash join
•
•
•
•
PROJECT operator implementation
Set operator implementation
Implementing Aggregate operators/functions
Implementing OUTER JOIN
ICS 424 - 01 (072)
Query Processing and
Optimization
22
ICS 424 - 01 (072)
Query Processing and
Optimization
23
Buffer Space and Join performance
In the nested-loop join, it makes a difference which file is chosen for
the outer loop and which for the inner loop. If EMPLOYEE is used for
the outer loop, each block of EMPLOYEE is read once, and the entire
DEPARTMENT file (each of its blocks) is read once for each time we
read in ( nB - 2) blocks of the EMPLOYEE file. We get the following:
Total number of blocks accessed for outer file = bE
Number of times ( nB - 2) blocks of outer file are loaded =  bE/ nB – 2 
Total number of blocks accessed for inner file = bD * bE/ nB – 2 
Hence, we get the following total number of block accesses:
bE + ( bE/ nB – 2  * bD) = 2000 + ( (2000/5)  * 10) = 6000 blocks
On the other hand, if we use the DEPARTMENT records in the outer
loop, by symmetry we get the following total number of block
accesses:
bD + ( bD/ nB – 2  * bE) = 10 + ((10/5)  * 2000) = 4010 blocks
ICS 424 - 01 (072)
Query Processing and
Optimization
24
Implementing Basic Query Operations (cont.)
• Combining operations using pipelining
• Temporary files based processing
• Pipelining or stream-based processing
• Example: consider the execution of the following query
list of attributes( ( c1(R)  ( c2 (S))
ICS 424 - 01 (072)
Query Processing and
Optimization
25
General Transformation Rules for
Relational Algebra Operations
1. Cascade of : A conjunctive selection condition can be
broken up into a cascade (that is, a sequence) of
individual  operations:
 C1 AND C2 AND ….AND Cn (R) ≡  C1 (C2( …(Cn(R))…)
2. Commutativity of  : The  operation is commutative:
C1(C2(R)) ≡ C2(C1(R))
3. Cascade of : In a cascade (sequence) of  operations, all
but the last one can be ignored
4. Commuting  with : If the selection condition c
involves only those attributes A1, ..., An in the projection
list, the two operations can be commuted
• And more …
ICS 424 - 01 (072)
Query Processing and
Optimization
26
Heuristic-Based Query Optimization
•
Outline of heuristic algebraic optimization algorithm
1. Break up SELECT operations with conjunctive conditions into a
cascade of SELECT operations
2. Using the commutativity of SELECT with other operations, move
each SELECT operation as far down the query tree as is permitted
by the attributes involved in the select condition
3. Using commutativity and associativity of binary operations,
rearrange the leaf nodes of the tree
4. Combine a CARTESIAN PRODUCT operation with a subsequent
SELECT operation in the tree into a JOIN operation, if the
condition represents a join condition
5. Using the cascading of PROJECT and the commuting of
PROJECT with other operations, break down and move lists of
projection attributes down the tree as far as possible by creating
new PROJECT operations as needed
6. Identify sub-trees that represent groups of operations that can be
executed by a single algorithm
ICS 424 - 01 (072)
Query Processing and
Optimization
27
Heuristic-Based Query Optimization:
Example
• Query
"Find the last names of employees born after 1957
who work on a project named ‘Aquarius’."
• SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO
AND ESSN=SSN AND BDATE.‘1957-12-31’;
ICS 424 - 01 (072)
Query Processing and
Optimization
28
ICS 424 - 01 (072)
Query Processing and
Optimization
29
ICS 424 - 01 (072)
Query Processing and
Optimization
30
ICS 424 - 01 (072)
Query Processing and
Optimization
31
ICS 424 - 01 (072)
Query Processing and
Optimization
32
ICS 424 - 01 (072)
Query Processing and
Optimization
33
Overview of Query Optimization in Oracle
• Rule-based query optimization: the optimizer chooses execution plans
based on heuristically ranked operations.
• May be phased out
• Cost-based query optimization: the optimizer examines alternative access
paths and operator algorithms and chooses the execution plan with lowest
estimate cost.
• The query cost is calculated based on the estimated usage of resources such as
I/O, CPU and memory needed.
• Application developers could specify hints to the ORACLE query
optimizer.
• application developer might know more information about the data.
• SELECT /*+ ...hint... */ [rest of query]
• SELECT /*+ index(t1 t1_abc) index(t2 t2_abc) */ COUNT(*)
FROM t1, t2
WHERE t1.col1 = t2.col1;
ICS 424 - 01 (072)
Query Processing and
Optimization
34
Summary
•
•
•
•
•
•
Background review
Processing a query
SQL queries and relational algebra
Implementing basic query operations
Heuristics-based query optimization
Overview of query optimization in Oracle
ICS 424 - 01 (072)
Query Processing and
Optimization
35