Examples of Physical Query Plan Alternatives Selections from Chapters 12, 14, 15 1 Query Optimization NOTE: Relational query languages provide a wide variety of ways in which a user can express. HENCE: system has many options for evaluating a query. Optimizer is important for query performance. Generates alternative plans Choose plan with least estimated cost. Ideally, find best plan. Realistically, consistently find a quite good one. 2 A Query (Evaluation) Plan An extended relational algebra tree Annotations at each node indicate: access methods to use for each table. implementation methods used for each relational operator. (On-the-fly) sname sname bid=100 rating > 5 bid=100 rating > 5 (On-the-fly) (Simple Nested Loops) sid=sid sid=sid Reserves Sailors Reserves Sailors 3 Query Optimization Multi-operator Queries: Pipelined Evaluation • On-the-fly: The result of one operator is pipelined to another operator without creating a temporary table to hold intermediate result, called on-the-fly. • Materialized : Otherwise, intermediate results must be materialized. C A B 4 Alternative Plans: Schema Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 5 Alternative Plans: Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND RA Tree: S.rating>5 sname bid=100 rating > 5 sid=sid Reserves Sailors RA Tree: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 sname bid=100 rating > 5 sid=sid Costs : Sailors Reserves 1. Scan Sailors : (On-the-fly) For each page of Sailors, scan Reserves Plan: sname 500+500*1000 I/Os Or, rating > 5 (On-the-fly) bid=100 2. Scan Reserves For each page of Reserves, scan Sailors 1000+1000 * 500 I/Os (Simple Nested Loops) sid=sid Reserves Sailors Alternative Plans: Motivating Example RA Tree: sname bid=100 SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 sid=sid Reserves Cost: 500+500*1000 I/Os Almost the worst plan! Reasons : Plan: Goal of optimization: To find more efficient plans that compute the same answer. Reserves Sailors (On-the-fly) sname bid=100 selections could be `pushed’ earlier, no use made of indexes rating > 5 rating > 5 (On-the-fly) (Simple Nested Loops) sid=sid Sailors (On-the-fly) Alternative Plans 1 (No Indexes) Main difference: push selects. Reduce size of table to be joined With 5 buffers, cost of plan: sname (Sort-Merge Join) sid=sid (Scan; write to bid=100 temp T1) Reserves rating > 5 Sailors Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) Total: 4060 page I/Os. Optimization1: block nested loops join: join cost = 10+4*250, total cost = 2770. (Scan; write to temp T2) Optimization2: `push’ projections: T1 has only sid, T2 only sid and sname: T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000. Alternative Plan : Using Index ? Push Selections Down ? What indices help here? Index on Reserves.bid? Index on Sailors.sid? Index on Reserves.bid? Index on Sailors.rating? sname sid=sid bid=100 rating > 5 Reserves Sailors Example Plan : With Index With index on Reserves.bid : Assume 100 different bid values. Assume 100,000 tuples. Assume 100 tuples/disk page sname (On-the-fly) rating > 5 (On-the-fly) We get 100,000/100 = 1000 tuples On 1000/100 = 10 disk pages. If index clustered, Cost = 10 I/Os. sid=sid (Use hash index; do not write result to temp) bid=100 Reserves (Index Nested Loops, with pipelining ) Sailors Example Plan Continued • Index on Sailors.sid : - Join column sid is key for Sailors. - At most one matching tuple, unclustered on sid OK. sname rating > 5 (On-the-fly) sid=sid • Cost? - For each Reserves tuples (1000): get matching Sailors tuple (1.2 I/O); so total 1210 I/Os. (On-the-fly) (Use hash index; do not write result to temp) bid=100 Reserves (Index Nested Loops, with pipelining ) Sailors Alternative Plan : With Second Index •Selection • Pushing down? sname (On-the-fly) Push (rating>5) before join ? rating > 5 (On-the-fly) • Answer: - No, because of availability of sid index on Sailors. • Reason : -No index on selection result. - Then selection requires scan Sailors. sid=sid (Use hash index; do not write result to temp) bid=100 Reserves (Index Nested Loops, with pipelining ) Sailors Summary A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. There are several alternative evaluation algorithms for each relational operator. Query evaluation must compare alternative plans based on their estimated costs Must understand query optimization in order to fully understand the performance impact of a given database design on a query workload 15
© Copyright 2026 Paperzz