2010/2011 Information Systems Department Third Year IS and ALL Minor IS Database II Sheet-2 Question 1: "Query Processing and Optimization Sheet" Supplier(Supp#, Name, City, Specialty) Project(Proj#, Name, City, Budget) Order(Supp#, Proj#, Part-name, Quantity, Cost) SELECT Supplier.Name, Project.Name FROM Supplier, Order, Project WHERESupplier.Supp# = Order.Supp# AND Order.Proj# = Project.Proj# AND Project.Budget> 10,000,000 ANDSupplier.City = ‘New York’ Assume that supplier relation has 20,000 tuples, Project relation has 400,000 tuples and Order relation has 1,000,000 tuples. a. Write a relational algebra expression that is equivalent to the above query and draw the canonical query tree for this expression. Solution: Relational Algebra Expression is Πsupplier.Name, Project.Name Missed σsupplier.supp# = order.supp# ^order.Proj#.Project.Proj# ^ project.budget>10,000,00 ^supplier.city = ‘New york’ × × Supplier Order 1 Project b. Apply the heuristic optimization transformation rules to find an efficient query execution plan for the above query. Assume that the number of the suppliers in New York is larger than the number of the projects with the budgets more than 10,000,000$. (show all your steps) Solution 1. Cartesian product join Πsupplier.Name, Project.Name σproject.budget>10,000,00 ^supplier.city = ‘New york’ ∞order.Proj#.Project.Proj# ∞supplier.supp# = order.supp# Project Order Supplier 2. Move selection down the tree: Πsupplier.Name, Project.Name ∞order.Proj# = Project.Proj# ∞supplier.supp# = order.supp# σsupplier.city = ‘New σproject.budget>10,000,00 york’ Supplier Project Order 2 3. Move Projection down: Πsupplier.Name, Project.Name ∞order.Proj# = Project.Proj# ∞supplier.supp# = order.supp# Πsupplier.Name, supplier.supp# ΠProject.Name, Project.Proj# ΠOrder.supp# σsupplier.city = ‘New σproject.budget>10,000,00 york’ Supplier Project Order 4. Perform the most restrictive conditions first. Since σproject.budget>10,000,00 yields fewer tuples than σsupplier.city = ‘New york’ So move project to the left side of the tree. Πsupplier.Name, Project.Name ∞order.supp# ∞Project.Proj# ΠProject.Name, Project.Proj# =order.Proj# Πsupplier.Name, supplier.supp# ΠOrder.supp# σproject.budget>10,000,00 Project =supplier.supp# σsupplier.city = ‘New york’ Order Supplier 3 c. No Project with budget as large as 10,000,000, how many tuples retrieved by the query? Justify your answer Solution It will return zero tuples, since there once side of the ‘And’ qualifies to false d. If the query changed to: SELECT Supplier.Name, Project.Name FROM Supplier, Order, Project WHERE Supplier.Supp# = Order.Supp# AND Order.Proj# = Project.Proj# AND Project.Budget> 10,000,000 Or Supplier.City = ‘New York’ If all suppliers are in New York, how many tuples will the query retrieve? Justify your answer Solution: It will return 20,000 * 400,000*1,000,000 = 8*1015tuples, since one side of the OR condition always qualifies to true e. If you have the query mentioned in point d and you know that no suppliers in city ‘New York’, how many tuples will the query retrieve? Justify your answer. Solution Unknown f. Draw Canonical tree for the above query mentioned in point d. Question 2: a. Given two relations R (A, C, D) and S(E, B, F) such that the number of records in R is 10,000 tuples each record is 150 bytes long, the number of records in S is 3,000,000 tuples each record is 600 bytes long, A block is of size 3000 bytes. Assume that there is an index on attribute S.B of height XB =4, also there is an index on R.A XA= 2.Number of distinct values of attribute B in S = 2000, calculate the number of block access in joining the two relations R, S (R ͚R.A=S.B S) assuming the cost of writing the join result back to disk is neglected. Solution: bR= # records in R* record Size of R / Block Size = 10000 * 150 / 3000 = 1500 blocks bS= # records in S* record Size of S / Block Size =3,000,000*600/3000 = 600,000 blocks sB : selection cardinality of B (avg number of records that has a given value of B as a foreign key in R ) = rR/DB = 10,000 / 2000= 5 Cost using XA = bS+ (rS* (XA + 1) = 600,000+ (3,000,000*(2+1)) = 9,600,000 block accesses 4 Cost using XB = bR+ (rR* (XB +sB) = 1500 + (10,000 * (4 + 5)) = 91,500 block accesses b. Given relation STUDENT (SSid, Name, Major, GPA) and Attends (ASid, Ccode). Assume that the number of records in STUDENT is 5000 records and the blocking factor of STUDENT (bfr STUDENT) is 250 and relation ATTENDS has 25000 records. The blocking factor of ATTENDS (bfrATTENDS) is 5000. Assume that the available buffer space is 6 blocks. Compute the cost of using Nested-Loop Join to perform the join to perform the join STUDENT ∞ SSid=ASid ATTENDS; neglect the cost of writing the result back to the disk. Solution . SOLUTION (3 grades): |STUDENT|= 5000 Records |ATTENDS|= 25000 Records bfr STUDENT =250 Records bfr ATTENDS = 500 Records b STUDENT= 5000/250=20 blocks. b Attends= 25000/500=50 blocks. Buffer Size (B)=6 blocks. Using the nested loop algorithm: Cost to perform join= [Join Cost] + [Cost of Writing back the join results to disk] Zero Join cost= b Student + ((b Student/B-2 )* b Attends) = 20 + ((20/4)*50)= 270 Block Accesses. OR Join cost= b ATTENDS + ((b ATTENDS/B-2)* b STUDENT) = 50 + ((50/4)*20) = 300 Block Accesses. But the first one is the efficient method for computing join cost using the nested loop method. Where it considers the smaller relation is the outer loop. Question 3: Consider the schema: Product (ProductId, ProductName, Unit price,InStockQuantity, SupplierId) Supplier(SupplierId, SupplierName, Country, City, Contact#) OrderDetail(ProductId,OrderId, Quantity,TotalPrice) Order(OrderID, OrderDate, RequiredDate) Select Product.ProductName, OrderDetail. Quantity From product, supplier, OrderDetail , order Where (OrderDetail. Quantity> 400 Or ((Supplier.City = 'Berlin' And order.OrderDate > ‘26- 3-2007’) Or OrderDetail. Quantity<= 400)) Or( Product.SupplierId = Supplier.SupplierId And OrderDetail.ProductId = Product.ProductId And OrderDetail.OrderID = Order. OrderID) a) Can you rewrite the above query with a fewer number of constraints without affecting its result? If yes, Write down the new “reduced”query and justify your answer. 5 Solution The reduced query will be: Select Product.ProductName, OrderDetail. Quantity From product, supplier, OrderDetail , order Where OrderDetail. Quantity> 400 Or OrderDetail. Quantity<= 400 b) If you know that Product table has 100 tuples, Order table has 200 tuples, OrderDetail table has 300 tuples, Supplier table has 450 tuples, and no tuples satisfy the condition OrderDetail.ProductId = Product.ProductId. how many tuples will returned as an answer to that query. Solution: #of returned tuples = 100 * 200*300*450 Question 4: a. “Pushing down the selection before the Join is useful”. Give an example to show that the above statement is not always correct. Solution; If one of the relations is very large and it has an index on the attributes in the join and no index on the attributes in the selection, it would be better to make the join first using the index then introducing the relatively smaller result of the join to the selection operator. b. Given two relations schemas R (A, B) and S (A, B), what is the condition that guarantees that the following rule is always true. ΠA(R-S) and ΠA(R) - ΠA(S) are equivalent Solution The condition is R and S haven’t any attributes except A. c. A basic rule in optimizing queries is to push project down the query tree, explain how this rule will reduce the query processing cost. Solution Pushing projection down will reduce (minimize) the number of fields in each record; hence it will reduce the size of the record. Consequently the block takes more records than before pushing the projection down (I.e. the blocking factor of the relation will be greater than before pushing projection down) 6 d. Given a relation instance R. R occupies 1230 disk blocks. Using the sorting algorithm to sort the instance R: i. What is the buffer size that will give the worst performance? ii. What is the buffer size that will give the best performance? Solution i. The buffer size that will give the worst performance is 3 blocks as Follows: One for SORT One for MERGE One for OUTPUT Also any student state that, the size will be 2 blocks is considered true where: One for SORT One for OUTPUT ii. The buffer size that will give the best performance is any number >=1230. 7
© Copyright 2026 Paperzz