15.1 * Introduction to physical-Query

15.1 – INTRODUCTION TO
PHYSICAL-QUERY-PLAN
OPERATORS
PRESENTED BY: JASON CHEE
QUERY PROCESSOR
• Query Processor:
Group of components
of a DBMS that turns
user queries and datamodification
commands into a
sequence of database
operations and
executes those
operations
QUERY COMPILATION (CH 16)
• Three parts:
• Parsing: Construct parse
tree
• Query rewrite: parse tree ->
query algebra -> logical
query plan (faster)
• Physical plan generation:
Converts logical query plan
to physical query plan by
selecting appropriate
algorithms and order of
execution.
PHYSICAL-QUERY-PLAN
OPERATORS
• Physical Operators often are implementations of
relational algebra operators
• Examples of non-relational operators:
• Scan: bring into memory each tuple of some relation
• Iterators: method by which operators comprising a physical
query plan can pass requests for tuples and answers among
themselves
SCANNING TABLES
• Reading the contents of a relation R
• Table-scan:
• Relation R is stored in secondary memory
• Blocks containing tuples of R are known, and it is possible to get
the blocks one by one
• Index-scan
• If there is an index on any attribute of R, we may be able to use
this index to get all the tuples of R.
SORTING WHILE SCANNING TABLES
• Sort relation as we read tuples for multiple reasons.
Examples:
• ORDER BY clause
• Operations requiring relations to be sorted
• Physical-query-plan operator sort-scan can be
implemented many ways. One example is a B-tree
index on sorted attribute a.
COMPUTATIONAL MODEL FOR
PHYSICAL OPERATORS
• Query is made of several operations of relational
algebra, and query plan composed of several
physical operators.
• Estimate cost by number of disk I/O’s.
• To compare algorithms, we assume that the
arguments of any operator are found on disk, but
the result of the operator is left in main memory.
• Because size of result doesn’t depend on algorithm
• Final write is cost of query, not algorithm
PARAMETERS FOR MEASURING COSTS
• M: Number of main memory buffers (size of block)
available to operator. Could be smaller than total
main memory if several operators share memory.
• B or B(R): Size of relation R – number of blocks to
hold all tuples of R
• T or T(R): Number of tuples in R.
• T/B = tuples per block
• V(R,[a1,a2,…an]): number of distinct values in a
column, or columns for multiple attributes
I/O COST FOR SCAN OPERATORS
• Table-scan:
• If R is clustered, need B disk I/Os
• If R is not clustered, could be up to T disk I/Os – as many
blocks as there are tuples
• Index-scan:
• If column data is contained in the index
• SELECT category_id FROM tbl WHERE category_id BETWEEN 10
AND 100;
• Don’t need to access the table
• Often smaller than B
ITERATORS FOR IMPLEMENTATION OF
PHYSICAL OPERATORS
•
•
•
•
Design pattern to implement physical operators
Three Methods
1) Open(): Initializes data structures
2) GetNext(): Returns the next tuple in the result and
adjusts data structures as necessary.
• If no more tuples, return not found
• 3) Close(): Ends the iteration for all tuples. Calls close
on any arguments of the operator.
TABLE-SCAN ITERATOR METHODS
THANK YOU
• Please feel free to ask any questions.