Emp б dname=Toy (Dept) - usc dblab

V Storage Manager
Shahram Ghandeharizadeh
Computer Science Department
University of Southern California
Two New Methods

void PrintStats()


Prints the cache hit rate and byte hit rate
observed by your implementation of the V
storage manager.
void DumpCache()


For each datazone name, show its object-id and
size occupying the cache. Conclude with a
summary table showing:
Data zone name, # of objects, average object
size, min object size, max object size
Minor Modifications

Create method should check to see if its
input dzname already exists. If so, it should
not create it a second time.
Trace Driven Evaluation

A trace consisting of the following information:
Trs,02/26/2009 23:07:48.4856586,Get,24,113392298,2797,0
Trs,02/26/2009 23:07:48.4856586,Get,17,120188330,64,0
Trs,02/26/2009 23:07:48.6262836,Save,24,89768490,1404,0
Trs,02/26/2009 23:07:48.7200336,Get,11,12671850,11155,0
Trs,02/26/2009 23:07:48.7200336,Get,5,449482986,229,0
Trs,02/26/2009 23:07:49.2200336,Delete,11,49174954,0,0
Trs,02/26/2009 23:07:49.2200336,Get,2,14161514,117,0
Trs,02/26/2009 23:07:49.2200336,Delete,11,444312042,0,0



Time stamp,
Possible commands are “Get”, “Delete” and “Save”,
Followed by data-zone name, key, and size of the value.
Trace Driven Evaluation



Gathered from a powerful production server
at myspace.
Idea: Submit the requests as fast as
possible. We probably cannot submit
requests at the rate that this server is
processing them.
Objective: A multi-threaded workload
generator that issues the requests as fast as
possible.
Workload Generator



Consists of a main thread and N worker
threads.
The main thread is in charge of reading a
trace file and populating main memory data
structures for the worker threads.
The worker thread reads its corresponding
main memory data structure element and
invokes the corresponding Get, Insert, and
Delete methods of V.
Main Thread



DZ-array: an array of 25 data-zones, each
represented as a Vdt element. The data field is 1
character long and its size is one. The data field
corresponds to the values 0 to 24.
Invokes the Create method of V storage manager for
each element of the array. If the data-zone already
exists, it returns with an error message.
Populates a linked list of P EachSecond elements,
each consisting of m trace elements:
struct EachSecond {
char Assigned[MAXTHREADS];
char Complete[MAXTHREADS];
int NumTraceElts[MAXTHREADS];
int *key[MAXTHREADS];
int *zoneid[MAXTHREADS];
int *size[MAXTHREADS];
cmndType *cmnd[MAXTHREADS];
struct EachSecond *next;
} OneSecond;

Starts up N worker threads, each with a unique-id 0 to N-1.
Worker Thread





A worker thread with id j
corresponds to element j of
the array in EachSecond.
Iterates from element 0 to
NumTraceElts[j] of cmnd,
zoneid, key, size.
cmndType is {Get, Save,
Delete}, invoking your
implementation of Get,
Insert, and Delete,
respectively.
Zoneid is an index into the
DZ-Array of the main
thread.
Key is a 4 byte char: Vdt
data is a char, its length is
4 bytes.
struct EachSecond {
char Assigned[MAXTHREADS];
char Complete[MAXTHREADS];
int NumTraceElts[MAXTHREADS];
int *key[MAXTHREADS];
int *zoneid[MAXTHREADS];
int *size[MAXTHREADS];
cmndType *cmnd[MAXTHREADS];
struct EachSecond *next;
} OneSecond;
Synchronization



The main thread and worker threads are
synchronized using handles (HANDLE hEvent).
Main thread creates an event and assigns it to
hEvent.
After populating the P elements of EachSecond
linked list and activating N threads to process the
elements, the main thread does:
For (int k = 0; k < N; k++) WaitForSingleObject(hEvent, INFINITE)



When a worker thread is done with one EachSecond
element, it sets the Complete[thread-id] to 1 and
does a SetEvent(hEvent).
Once the main-thread falls out of the for loop, it
checks to see if the first N elements of Complete
array in the current EachSecond are set. If so, this
means all the N threads are done with this
EachSecond element.
Main thread populates this EachSecond element
with additional trace elements and links it to the end
of the P list.
Termination Condition


Once the main thread hits the end of the
trace file, it will wait for the N threads to
complete.
Once a worker thread process all elements
of the P linked list (encounters a “Null” for
the next element), it terminates by returning.
An Overview of Query Optimization in
Relational Systems (by S. Chaudhuri)
Shahram Ghandeharizadeh
Computer Science Department
University of Southern California
Terminology

A SQL relational database management
system consists of:


Query optimizer
Query execution engine: implements a set of
physical operators.


An operator consumes one or more data streams and
produces an output data stream, e.g., sort, sequential
scan, index scan, nested loop join, etc.
Physical operator tree or execution plan ties different
operators with one another.
Terminology (Cont…)

Physical property is any characteristics of a plan that is not
shared by all plans for the same logical expression, but can
impact the cost of subsequent operations. Concept of
interesting order in System R.
Query Execution Engine


RSS of system R is a query execution
engine.
Is Berkeley-DB a query execution engine?
Query Optimizer

Generates the most efficient execution plan
for the execution engine.



Optimization as a search problem, providing:




Non-trivial because there can be a large number
of possible trees.
The optimization criteria might be different:
throughput versus response time.
A space of plans, the search space.
Cost estimation technique
Enumeration algorithm to search through the
search space.
Ideal optimizer:



The search space includes low cost plans,
Cost estimation techniques are accurate,
Enumeration algorithm is efficient.
Basic Estimation Framework
1.
2.
Collect statistical summaries of data that
has been stored.
Given an operator and the statistical
summary for each of its input data streams,
determine the:


Statistical summary of the output data stream,
Estimated cost of executing the operation.
EQUALITY JOIN (Cont…)

Example: Assume the following statistics on the Employee and
Department relations: t(Dept)=1000 tuples, P(Dept)=100 disk pages,
ν(Dept,dno)=1000, ν(Dept,dname)=500. t(Employee)=100,000 tuples
and P(Employee)=10,000 pages. Note that 10 tuples of each relation fit
on a disk page. Assume that a concatenation of one Employee and
one Dept record is wide enough to enable a disk page to hold five
such records.
Lets compare the cost of two alternative algebraic expressions for
processing a query that retrieves those employees that work for the
toy department:


б dname=Toy(Emp Dept)
Emp б dname=Toy(Dept)
EQUALITY JOIN (Cont…)

б dname=Toy(Emp
Dept)
t(Tmp1) = t(Emp) × t(Dept) / ν(Dept, dno)
Tmp2
= 100,000 ×1000 / 1000 = 100,000
P(Tmp1) = 100,000 / 5 = 20,000
б dname=Toy
C( ) = P(Dept) + P(Emp) ×P(Dept)
= 100 +10,000 ×100 = 1,000,100 (page nested loop)
Tmp1
Cw(Tmp1) = 20,000
Cw(б) = 20,000
t(Tmp2) = t(Tmp1) / ν(Dept, dname)
Emp
Dept
= 100,000 / 500 = 200
P(Tmp2) = 200 / 5 = 40
Cw(Tmp2) = 40
Cost = C( ) + Cw(Tmp1) + C(б) + Cw(Tmp2)
= 1,000,100 + 20,000 + 20,000 + 40
= 1,040,140 I/O
EQUALITY JOIN (Cont…)

б dname=Toy(Dept)
Emp
Tmp2
Emp
Tmp1
б dname=Toy
Dept
Cw(б) = 100
t(Tmp1) = t(Dept) / ν(Dept, dno)
= 1000 / 500 = 2
P(Tmp1) = 1
Cw(Tmp1) = 1
C( ) = P(Tmp1) + P(Tmp1) ×P(Emp)
= 1 +1×10,000 = 10,001 (page nested loop)
t(Tmp2) = t(Emp) × t(Tmp1) / ν(Emp,dno)
= 100,000 ×2 / 1000 = 200
P(Tmp2) = 200 / 5 = 40
Cw(Tmp2) = 40
Cost = C(б) + Cw(Tmp1) + C( ) + Cw(Tmp2)
= 100 + 1 + 10,001 + 40
= 10,142 I/O
Statistical Summaries of Data

Equi-height histograms
Review of System R





Use of dynamic programming and interesting order
as a heuristic.
Dynamic programming when optimizing a query
with multiple join predicates
Assumption: To obtain an optimal plan for a query
Q consisting of k joins, it suffices to consider only
the optimal plans for subexpressions of Q that
consist of (k-1) joins and extend those plans with an
additional join.
Prune suboptimal plans for subexpressions of Q
consisting of (k-1) joins.
For example, the optimal plan for {R1, R2, R3, R4} is
obtained by picking the plan with the cheapest cost
from among the optimal plans for:
Join({R1, R2, R3}, R4)
Join({R1, R2, R4}, R3)
Join({R1, R3, R4}, R2)
Join({R2, R3, R4}, R1)
Instead of analyzing O(n!) plans, it considers O(n 2n-1) plans
where n is the number of relations.
Left Outer Join

R Left Outer Join S
Merging Views

If one or more relations in a query are views,
and each is defined using a conjunctive
predicate, then the view definition can
simply be “unfolded” to obtain a single
block SQL query.

May not work when the views are more
complex using aggregates or eliminate
duplicates using “Select distinct”.
Nested Subqueries

“Flatten” nested queries whenever possible
?
Nested Subqueries

“Flatten” nested queries whenever possible
“Flattening” Queries with Aggregates

Aggregates are more tricky. Consider:
“Flattening” Queries with Aggregates

What is wrong with this?
SELECT
FROM
WHERE
GROUP BY
HAVING
Dept.name
Dept, Emp
Dept.name = Emp.dept_name
Dept.name
Dept.num-of-machines < Count(Emp.*)
“Flattening” Queries with Aggregates

What is wrong with this?
SELECT
FROM
WHERE
GROUP BY
HAVING
Dept.name
Dept, Emp
Dept.name = Emp.dept_name
Dept.name
Dept.num-of-machines < Count(Emp.*)
“Flattening” Queries with Aggregates

Correct flattening will use outerjoin: