Matakuliah Tahun Versi : M0184 / Pengolahan Data Distribusi : 2005 : Session – 10 QUERY OPTIMIZATION OBJECTIVE • QUERY Optimization process • Distribute query optimization algorithms DISTRIBUTED QUERY OPTIMIZATION ALGORITHMS Algorith ms DIST. INGRES Optm. Timing Dynami c Obj. Function Resp.Time or Total Cost R* Static Total Cost SDD – 1 Static Msg Size Optm. Factors Msg Size, Proj Cost Networ Semi k Joins General General Yes General No / broadca st #Msg, General No Msg Size, / Local IO, CPU Source : Principles Of Distributed Database Systems DISTRIBUTED INGRES Algorithm • Consist of dynamically optimizing the processing strategy of a given query. • The objective function of the algorithm is to minimize response time and cost • General and broadcast network are consider, therefore the data unit can be transmitted from one site to all the other sites in a single transfer • This algorithm is executed by the site, called Master site where the query is initiated R * Algorithm • Uses compilation approach where an exhaustive search of all alternative strategies is performed in order to choose the one with the least cost. • Query compilation is a distributed task in R*, coordinated by master site, where the query is initiated. • To join two relation, there are three candidate sites : site for 1st relation, site for 2nd relation and 3rd site. • Two method for intersite data transfers : Ship Whole and Fetch as Needed SDD-1 Algorithm • Refinements of an initial feasible solution are recursively computed until no more cost improvement can be made. • It is devised for wide area point-to-point network • The cost of transferring the result to final site is ignored Method of Performing JOIN • On the basic relational operation (SELECT, JOIN , PROJECT) the most expensive in term in both time and money is JOIN. It causes more page swaps. • The frequency of page swaps is often used as the cost measure when modeling these system SEMI-JOIN Method Symbol : Semi Join of Relation R1 and Relation R2 R1 R1 R2 = π 1 (R1 R2 = R1 R2) (πR2) ………… (1.1) ………… (1.2) The second example has potential advantage over the first if R1 and R2 are at different sites A and B, and the execution of full JOIN between R1 and R2 is to be done at the 3rd Site C – Refer to SDD-1 algorithm system SEMI-JOIN Method Cont’d Procedure SEMI-JOIN • Project the JOIN attributes from R2 at B ( = πR2), after applying any required SELECTIONs • Transmit πR2 to A • Compute Semi Join of R1 at A • Move the result to C Non SEMI-JOIN Methods • In R*, execution is generated at compilation time and therefore it uses a static optimization algorithms. • Query can be compiled during times when the system is slack and the optimal access plan for their execution are determined then. • The nested loop and merge scan method are used for join in R* • The nested loop method scans one relation called outer relation for each tuple of the other relation, called the inner relation with a matching JOIN Value Non SEMI-JOIN Methods • 1. 2. 3. 4. Preceding query execution, four stages of “query preparation” are completed in R*, they are : Analyze the SQL query and perform object name resolution Look up the catalog to check authorization and view integration Plan the minimum cost access strategy for the global query Generate the access modules and store them at a master site Specific Issues for Modern Query Optimization • Is a static or a dynamic algorithm used • Are independence and uniformity assumptions made • What JOIN method is used • Is account taken of the effects of other system modules on optimization • Is simultaneous query traffic taken into account • Is the search for an optimum exhaustive • Is any semantic query transformation carried out • Is the influence of heterogeneity considered
© Copyright 2026 Paperzz