Information Systems Department

2010/2011
Information Systems Department
Third Year IS and ALL Minor IS
Database II
Sheet-2
Question 1:
"Query Processing and Optimization Sheet"
Supplier(Supp#, Name, City, Specialty)
Project(Proj#, Name, City, Budget)
Order(Supp#, Proj#, Part-name, Quantity, Cost)
SELECT Supplier.Name, Project.Name
FROM Supplier, Order, Project
WHERESupplier.Supp# = Order.Supp#
AND Order.Proj# = Project.Proj#
AND Project.Budget> 10,000,000
ANDSupplier.City = ‘New York’
Assume that supplier relation has 20,000 tuples, Project relation has 400,000 tuples and
Order relation has 1,000,000 tuples.
a. Write a relational algebra expression that is equivalent to the above query and draw the
canonical query tree for this expression.
Solution:
Relational Algebra Expression is
Πsupplier.Name, Project.Name
Missed
σsupplier.supp# = order.supp#
^order.Proj#.Project.Proj#
^ project.budget>10,000,00 ^supplier.city = ‘New york’
×
×
Supplier
Order
1
Project
b. Apply the heuristic optimization transformation rules to find an efficient query execution
plan for the above query. Assume that the number of the suppliers in New York is larger
than the number of the projects with the budgets more than 10,000,000$. (show all your
steps)
Solution
1. Cartesian product  join
Πsupplier.Name, Project.Name
σproject.budget>10,000,00
^supplier.city = ‘New york’
∞order.Proj#.Project.Proj#
∞supplier.supp# = order.supp#
Project
Order
Supplier
2. Move selection down the tree:
Πsupplier.Name, Project.Name
∞order.Proj# = Project.Proj#
∞supplier.supp# = order.supp#
σsupplier.city = ‘New
σproject.budget>10,000,00
york’
Supplier
Project
Order
2
3. Move Projection down:
Πsupplier.Name, Project.Name
∞order.Proj# = Project.Proj#
∞supplier.supp# = order.supp#
Πsupplier.Name, supplier.supp#
ΠProject.Name, Project.Proj#
ΠOrder.supp#
σsupplier.city = ‘New
σproject.budget>10,000,00
york’
Supplier
Project
Order
4. Perform the most restrictive conditions first. Since σproject.budget>10,000,00 yields fewer
tuples than σsupplier.city = ‘New york’ So move project to the left side of the tree.
Πsupplier.Name, Project.Name
∞order.supp#
∞Project.Proj#
ΠProject.Name, Project.Proj#
=order.Proj#
Πsupplier.Name, supplier.supp#
ΠOrder.supp#
σproject.budget>10,000,00
Project
=supplier.supp#
σsupplier.city = ‘New york’
Order
Supplier
3
c. No Project with budget as large as 10,000,000, how many tuples retrieved by the query?
Justify your answer
Solution
It will return zero tuples, since there once side of the ‘And’ qualifies to false
d. If the query changed to:
SELECT Supplier.Name, Project.Name
FROM Supplier, Order, Project
WHERE Supplier.Supp# = Order.Supp#
AND Order.Proj# = Project.Proj#
AND Project.Budget> 10,000,000
Or Supplier.City = ‘New York’
If all suppliers are in New York, how many tuples will the query retrieve? Justify your
answer
Solution:
It will return 20,000 * 400,000*1,000,000 = 8*1015tuples, since one side of the OR
condition always qualifies to true
e. If you have the query mentioned in point d and you know that no suppliers in city ‘New York’,
how many tuples will the query retrieve? Justify your answer.
Solution
Unknown
f. Draw Canonical tree for the above query mentioned in point d.
Question 2:
a. Given two relations R (A, C, D) and S(E, B, F) such that the number of records in R
is 10,000 tuples each record is 150 bytes long, the number of records in S is
3,000,000 tuples each record is 600 bytes long, A block is of size 3000 bytes. Assume
that there is an index on attribute S.B of height XB =4, also there is an index on R.A
XA= 2.Number of distinct values of attribute B in S = 2000, calculate the number of
block access in joining the two relations R, S (R ͚R.A=S.B S) assuming the cost of
writing the join result back to disk is neglected.
Solution:
bR= # records in R* record Size of R / Block Size = 10000 * 150 / 3000 = 1500 blocks
bS= # records in S* record Size of S / Block Size =3,000,000*600/3000 = 600,000 blocks
sB : selection cardinality of B (avg number of records that has a given value of B as a foreign
key in R ) = rR/DB = 10,000 / 2000= 5
 Cost using XA = bS+ (rS* (XA + 1) = 600,000+ (3,000,000*(2+1)) = 9,600,000 block
accesses
4

Cost using XB = bR+ (rR* (XB +sB) = 1500 + (10,000 * (4 + 5)) = 91,500 block accesses
b. Given relation STUDENT (SSid, Name, Major, GPA) and Attends (ASid, Ccode).
Assume that the number of records in STUDENT is 5000 records and the blocking
factor of STUDENT (bfr STUDENT) is 250 and relation ATTENDS has 25000 records.
The blocking factor of ATTENDS (bfrATTENDS) is 5000. Assume that the available
buffer space is 6 blocks. Compute the cost of using Nested-Loop Join to perform the
join to perform the join STUDENT ∞ SSid=ASid ATTENDS; neglect the cost of writing
the result back to the disk.
Solution
.
SOLUTION (3 grades):
|STUDENT|= 5000 Records
|ATTENDS|= 25000 Records
bfr STUDENT =250 Records
bfr ATTENDS = 500 Records
b STUDENT= 5000/250=20 blocks.
b Attends= 25000/500=50 blocks.
Buffer Size (B)=6 blocks.
Using the nested loop algorithm:
Cost to perform join= [Join Cost] + [Cost of Writing back the join results to disk]
Zero
Join cost= b Student + ((b Student/B-2 )* b Attends) = 20 + ((20/4)*50)= 270 Block Accesses.
OR
Join cost= b ATTENDS + ((b ATTENDS/B-2)* b STUDENT) = 50 + ((50/4)*20) = 300 Block Accesses.
But the first one is the efficient method for computing join cost using the nested loop
method. Where it considers the smaller relation is the outer loop.
Question 3:
Consider the schema:
Product (ProductId, ProductName, Unit price,InStockQuantity, SupplierId)
Supplier(SupplierId, SupplierName, Country, City, Contact#)
OrderDetail(ProductId,OrderId, Quantity,TotalPrice)
Order(OrderID, OrderDate, RequiredDate)
Select Product.ProductName, OrderDetail. Quantity
From product, supplier, OrderDetail , order
Where (OrderDetail. Quantity> 400
Or ((Supplier.City = 'Berlin' And order.OrderDate > ‘26- 3-2007’) Or OrderDetail.
Quantity<= 400))
Or( Product.SupplierId = Supplier.SupplierId
And OrderDetail.ProductId = Product.ProductId
And OrderDetail.OrderID = Order. OrderID)
a) Can you rewrite the above query with a fewer number of constraints without affecting its
result? If yes, Write down the new “reduced”query and justify your answer.
5
Solution
The reduced query will be:
Select Product.ProductName, OrderDetail. Quantity
From product, supplier, OrderDetail , order
Where OrderDetail. Quantity> 400 Or OrderDetail. Quantity<= 400
b) If you know that Product table has 100 tuples, Order table has 200 tuples, OrderDetail
table has 300 tuples, Supplier table has 450 tuples, and no tuples satisfy the condition
OrderDetail.ProductId = Product.ProductId. how many tuples will returned as an answer
to that query.
Solution:
#of returned tuples = 100 * 200*300*450
Question 4:
a. “Pushing down the selection before the Join is useful”.
Give an example to show that the above statement is not always correct.
Solution;
If one of the relations is very large and it has an index on the attributes in the join and no index
on the attributes in the selection, it would be better to make the join first using the index then
introducing the relatively smaller result of the join to the selection operator.
b. Given two relations schemas R (A, B) and S (A, B), what is the condition that guarantees that
the following rule is always true. ΠA(R-S) and ΠA(R) - ΠA(S) are equivalent
Solution
The condition is R and S haven’t any attributes except A.
c. A basic rule in optimizing queries is to push project down the query tree, explain how this
rule will reduce the query processing cost.
Solution
Pushing projection down will reduce (minimize) the number of fields in each record; hence
it will reduce the size of the record. Consequently the block takes more records than before
pushing the projection down (I.e. the blocking factor of the relation will be greater than
before pushing projection down)
6
d. Given a relation instance R. R occupies 1230 disk blocks. Using the sorting algorithm to sort
the instance R:
i. What is the buffer size that will give the worst performance?
ii. What is the buffer size that will give the best performance?
Solution
i.
The buffer size that will give the worst performance is 3 blocks as Follows:
One for SORT
One for MERGE
One for OUTPUT
Also any student state that, the size will be 2 blocks is considered true where:
One for SORT
One for OUTPUT
ii.
The buffer size that will give the best performance is any number >=1230.
7