Query Tree Disassembly The Query Tree - 1 • After the Lexical and Syntactical Analyzer has decided that the SQL you have written conforms to the rules • Parser then converts the table, column, view etc. names into object codes and hands the parsed query to the …… The Query Tree – 2 Optimiser • The Optimiser then “plans” the query in such a way as to break down the complex query into a series of simple steps • This series of simple steps is known as the query tree Plan and Detail - 1 • There are a series of commands that provide levels of detail about the optimiser plans for a query • The primary command to generate the plan output is Set Option Query_Plan = ON Just Query_Plan • This generates an output similar to this :Query Plan: 1 #17: Scrolling Cursor Store Child Node 1: #15 2 . #15: Order By Child Node 1: #14 3 . . #14: Group By (Hash) Child Node 1: #11 Total cached join plans: 25 Total plan cache hits: 64 4 . . . #11: Join (Sort-Merge) Left Child Node: #12 Right Child Node: #13 Valid Join Algorithms: NW, SMJ, HJ, NLPW Left Input Table 1: customer Right Input Table 1: region Right Input Table 2: nation Right Input Table 3: supplier Right input Table 4: lineitem Right Input Table 5: orders Condition 1:(customer.c_nationkey = supplier.s_nationkey) Condition 2:(customer.c_custkey = orders.o_custkey) Table Row Count: 1500 For more information • The query_detail option provides a much richer functionality in the optimiser output • Mainly the information concerns what is to be passed (or pushed) around in the query tree • To generate detail statistics Set Option Query_Plan = ON Set Option Query_Detail = ON Query Detail • Query_Detail generates output like this:Query Plan: 0 #16: Root Child Node 1: #17 Output Vector: 2 entries (47 data bytes) Output 1: nation.n name Output 1 Data Type: CHAR (25, 0) Output 2: (SACast ((SUM ((lineitem.i_extendedprice * (1 – lineitem.l_discount))))) Output 2 Data Type: NUMERIC (35, 4) 2 . #15: Order By Child Node 1: #14 Ordering Expression 1: SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount))) Ordering Expr. 1 Direction: Descending Output Vector: 2 entries (49 data bytes) Output 1: nation.n_name Output 2: SUM((lineitem.l_ extendedprice * (1 - lineitem.l_discount))) 3 . . #14: Group By (Hash) Child Node 1: #11 Grouping Expression 1: nation.n_name Total cached join plans: 25 Total plan cache hits: 64 Output Vector: 2 entries (49 data bytes) Output 1: nation.n_name Output 2: SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount))) Output 2: Data Type: NUMERIC (35, 4) Everything about the query • For all the available information about a query from the optimiser, Secure Statistics is what is required • In IQ 11 this was a hidden option • In ASIQ-M12.4.2 it is still sort of hidden Secure Stats • Secure Stats provides a very high level of detail about the optimiser plans • Including the optimiser estimates as to the selectivity and join criteria • To use Secure Stats : Set Option Query_Plan = ON Set Option Query_Detail = ON Set Option dml_option10 = ON dml_option10 Output • dml_option10 generates output like this:Query Plan: 0 #16: Root Child Node 1: #17 Estimated Result Rows: 25 Production 1: (SACast ((SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount)))) Output Vector: 2 entries (47 data bytes) Output 1: nation.n_name Output 1 Data Type: CHAR (25, 0) Output 2: (SACast ((SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount)))) Output 2 Data Type: NUMERIC (35, 4) 1 #17: Scrolling Cursor Store Child Node 1: #15 Estimated Result Rows: 25 Output Vector: 2 entries (49 data bytes) Output 1: nation.n_name output 2: SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount))) 2 . #15: Order By Child Node 1: #14 Ordering Expression 1: SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount))) Ordering Expr. 1 Value: May be NULL Ordering Expr. 1 Direction: Descending Non-Key Data 1: nation.n name Non-Key Data 1 Value: Will never be NULL Estimated Result Rows: 25 Output Vector: 2 entries (49 data bytes) Output 1: nation.n_name Output 2: SUM((lineitem.l_extendedprice * (1 - lineitem.l_discount))) See the SQL • New Engine startup Option for 12.4.2 – -zo <filename> and –zr sql • This option prints out all the SQL passed to the optimizer for the duration of the server execution • Much better than the ODBC trace log! • These options must be in the config file – not on the server start line New in 12.4.2 • Option Query_Timing • This option when set on reports in the query tree output the timing of – The overall Query – The Sub-Query Timing – The timing of every node in the Query • For this to work you must have Query_Plan_After_Run set to ON (for obvious reasons!) Query Timing Request from Higher Node to start processing Answer from Tree: Tree complete Request for first row Ack. Of first row passed up Time of First Prepare : 11:15:57.797306 Time of First Prepare Return: 11:15:57.919217 Time of First Fetch Call : 11:15:57.919574 Time of First Fetch Return : 11:16:01.290265 Time of Last Fetch : 11:16:01.292390 Time of Last Complete : 11:16:01.326976 Ack. Of last row passed up Answer from Tree: All resources Returned New in 12.4.3 • Option Query_Plan_As_HTML • This option generates the Query Plan as an HTML document – stored in the execution directory of the server. • For this to work the option Query_Name must be set • The output name of the file is then: <UserName>_<Query_Name>.htm Dba_Query7.htm Execution Output • Of course the options just described are only prior to query execution • The Run Time Engine also generates output • This output is the Command Statistics Command Statistics • Command Statistics are now documented – sort of! You have to set the COMMAND_STATS option, then use sp_iqcommandstats • During command execution statistics are collected. • Note this is not the case for – Set, Dump, Restore etc. • Stats. are kept in memory until the next time COMMAND_STATS is set to ON. Statistics collected • • • • • • • • • Original Command String Total Execution Time State of all Buffer Managers, before and after The set of indexes used, and why Rows returned Sub-Query re-execution count Average time to execute sub-query Maximum time to execute sub-query etc... Command Statistics • Command Statistics provided by Set temporary option command_stats = ON (before the query is run) and sp_iqcommandstats n (after the query is run) – 1 - provides summary info - 1 page – 2 - medium detail (suggested) - 4 pages – 3 - Max Detail (really only for Tech. Support) - 12 pages. Command Statistics Output Statistic Demandable stats before query execution bufman: Total Finds: Find Hits: Total Creates: Total Destroys: Total Dirtys Total PrefetchExecuted Physical Reads Physical Writes: TotalBuffersEverUsed bufman: Total Finds: Find Hits: Total Creates: Total Destroys: Total Dirtys Total PrefetchExecuted Physical Reads Physical Writes: TotalBuffersEverUsed Command Execution Time (seconds) Command Id Command Type Stats Repository Unsolicited StatsObjs Stats Repository Demanded StatsObjs Command Execution Time (seconds) Command Id Command Type Stats Repository Unsolicited StatsObjs Stats Repository Demanded StatsObjs DFO #5 Node Type DFO #5 Child Node 1 DFO #5 Output Vector DFO #5 Output 1 DFO #5 Output 1 Data Type DFO #5 Output 2 DFO #5 Output 2 Data Type DFO #5 Generated Result Rows Value 4 temp Bufman 1539 1533 202 200 353 0 6 0 208 main Bufman 1415 1245 0 0 3 6 172 1 172 not used 0 72 45 0 4 0 73 51 1 4 Root #06 2 entries (12 data bytes) supplier.s-suppkey INTEGER (10, 0) COUNT(partsupp.ps-partkey) UINT64 (20, 0) 57 So you have a Query • All of the information relating to the Optimiser and Execution engine relate to the TPCD Query 5, which is : SELECT n_name, SUM(l_extendedprice*( 1 – discount)) FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey AND c _ orderkey AND 1_suppkey AND c_nationkey AND s_nationkey AND n_regionkey AND r name AND o orderdate AND o orderdate GROUP BY n_name ORDER BY 2 DESC; = = = = = = = >= < o_custkey 1_orderkey s_suppkey s_nationkey n_nationkey r_regionkey 'ASIA' '1994-01-01' dateadd(year, 1, '1994-01-01') Building the Query Tree • The first task that I perform on checking out a query is to build the query tree (or run with HTML on) • This is a sort-of upside down tree with all the leaves at the base and the root at the top • Result row flow up from the leaves to the root. Hence the correct name of the Query Tree is the dataflow model Query Plan • Query Plan Output is mapped to a “tree” • Query Tree consists of “nodes” – Each node represents a stage of work – Each node has a Name and DFO Number DFO = Data Flow Operator – ‘Top’ of the tree is the ROOT node (highest node#) – Lowest node is the “Leaf” Dataflow Model Root Group By Join Table 1 Table 2 Nodes in the Query Plan - 1 Query Plan: 17 15 14 11 12 13 1 #17: Scrolling Cursor Store Child Node 1: #15 2 . #15: Order By Child Node 1: #14 3 . . #14: Group By (Hash) Child Node 1: #11 Total cached join plans: 25 Total plan cache hits: 64 4 . . . #11: Join (Sort-Merge) Left Child Node: #12 Right Child Node: #13 Valid Join Algorithms: NW, SMJ, HJ, NLPW Left Input Table 1: customer Right Input Table 1: region Right Input Table 2: nation Right Input Table 3: supplier Right input Table 4: lineitem Right Input Table 5: orders Condition 1:(customer.c_nationkey = supplier.s_nationkey) Condition 2:(customer.c_custkey = orders.o_custkey) Table Row Count: 1500 The Level Number represents the “depth” in the Query Tree Nodes in the Query Plan - 2 The DFO Node Number is a Unique Number that identifies the Node. The “chaining” down the tree is shown by the Child Node numbers Query Plan: 1 #17: Scrolling Cursor Store Child Node 1: #15 2 . #15: Order By Child Node 1: #14 3 . . #14: Group By (Hash) Child Node 1: #11 Total cached join plans: 25 Total plan cache hits: 64 4 . . . #11: Join (Sort-Merge) Left Child Node: #12 Right Child Node: #13 Valid Join Algorithms: NW, SMJ, HJ, NLPW Left Input Table 1: customer Right Input Table 1: region Right Input Table 2: nation Right Input Table 3: supplier Right input Table 4: lineitem Right Input Table 5: orders Condition 1:(customer.c_nationkey = supplier.s_nationkey) Condition 2:(customer.c_custkey = orders.o_custkey) Table Row Count: 1500 Dataflow 17 15 12 17 Nodes request Data from the node below 15 14 14 11 11 13 Data is sent up the tree 12 13 Sort Merge Pushdown • The section below shows the Push Down section (TPCD Query 9) Condition 3 (Pushed): (lineitem.l_partkey PROBABLY_IN BV(0, part.p_partkey)) Condition 3 Selectivity: 0.15000000 Condition 3 Index: FP ASIQ_IDX_T193_C2_FP Join comparator: non-null identically typed key PD Filter page count: 2 PD Filter EQ classes possible: 32768 PD Filter EQ classes present: 97 PD Filter EQ class bit density: 0.00296021 PD Filter input row count: 60175 PD Filter output row count: 2901 PD Filter selectivity: 0.04820939 PD Filter est. selectivity: 0.15000000 Node Types • All of the nodes in the query plan have a type • There are three main types of nodes – Vertical Cursors and Filters – Aggregation Nodes (Group By Nodes) – Join Nodes • All of the types of nodes is discussed further Definitions • Local Predicates Order By – These are conditions in the WHERE clause which access only one table. Group By • There are two “types” Join Filter Vertical Cursor – Vertical Cursor Nodes – Filter Nodes Join Filter Vertical Cursor Vertical Cursor Vertical Cursor • A vertical cursor the part of a query that can be solved directly using IQ indexes – If there are local predicates, then these can be solved through selective filtering using the indexes directly – The multiple indexes per column allows IQ to use this at the base of ALL table searches • After processing the local predicates the list of rows is “projected” up the index tree as elements of the Fast Project Index for the column concerned • After this point IQ processes the remainder of the query from the result set of FP data for the required columns Vertical Cursor Node Vertical Cursor Node • When index can satisfy WHERE clause predicate • bitmaps are read directly Example: WHERE avail_qty > 100 • There are three complex Vertical Cursors – Vertical Aggregation Cursor – Vertical Grouping Cursor – Vertical Distinct Cursor (These are generated by some functions or group by or distinct clauses) Filter Node - 1 • Filter Node – Used when bitmaps alone cannot satisfy query – Becomes an “index scan” to resolve query WHERE avail_qty * 5 = 20 or WHERE datepart(mm, order_date) = 1 • Check the SQL - the clause may be able to be changed • This may be an area where changing the query could cause a substantial performance improvement Filter Node - 2 • In IQ 12 the optimiser can perform 2 types of predicate processing at the index level • LIKE predicates can be (and are) pushed down into the index for processing • This ability is also true for very large IN lists (> 1,000 values) – In IQ 12.4 the decision has been moved to a very much greater size of IN list for index processing (around 16,000 entries) IQ Table Scan • If a query plan has a filter above a vertical cursor, this is an IQ table scan • This can, and usually is expensive • If the SQL can be modified to get the optimiser to “push” the filter down into the vertical cursor then the query should speed up Predicate Diagnostics -- 1 --- 2 ---- 3 ----- Filter #03: Group By Single requires a “scan” Child Node 1: #02 . #02: Filter Child Node 1: #01 Condition 1: (customer.account_type IN (1,2,3,4,5,6)) . . #01: Vertical Cursor Vertical Cursor Table Name: customer satisfied by Index Table Row Count: 5000 Condition 1: (customer.account_balance > 0) Condition 2: (customer.account_balance < 5000) This is an “old” example – the IN list would now be “pushed” down to the vertical cursor for IQ 12 Indexes and Predicates Certain Index types are better for certain predicates Operation FP LF HG HNG = <, > between IN LIKE IS NULL slow slow slow slow slow fast fast med/fast med/fast fast medium fast fast fast fast fast medium fast medium fast fast medium medium fast Inferred Predicates - 1 • Consider the case: Select count(*) from TABLE where TABLE.a = 10 and TABLE.a = TABLE.b • In the above IQ will apply the rules for transitive closure – [if a=b and b=c then a=c] • And add the clause and TABLE.b = 10 • This allows the optimiser another clause to help solve the query Inferred Predicates - 2 • This can also help a join • Consider the case: Select count(*) from TABLE1, TABLE2, TABLE3 where TABLE1.key = TABLE2.key and TABLE2.key = TABLE3.key • IQ will then add the clause and TABLE1.key = TABLE3.key • This allows the optimiser another clause to help solve the query IN List vs. OR ? • In IQ-M the following statements are processed in the same way Select * from T where T.a IN(1,2,3,4) Select * from T where T.a = 1 or T.a = 2 or T.a =3 or T.a = 4 • This means that there is no real requirement for the removal of “or” processing in SQL • This favors third party tools – as “or” queries are usually easier to write from query generators Range Predicates • In IQ 11 the following were the same Where table.date >= ‘1-Jan-1997’and table.date < ‘1-Jan-1998’ And Where table.date between ‘1-Jan-1997’ and 31-Dec-1997’ • New to 12 is the new optimization Where table.date >= ‘1-Jan-1997’and table.date < ‘1-Jan-1998’ is the same as Where table.date between ‘1-Jan-1997’ and ‘1-Jan-1998’ excluding rows on ‘1Jan-1998’ • Again this provides the optimiser with more potential “routes” to solve the query Predicate Factoring SELECT * FROM T1, T2 WHERE T1.key = T2.key AND (T1.nation = ‘Spain’ AND T1.food = ‘apple’ AND T2.name = ‘Chen’) OR (T1.nation = ‘India’ AND T1.food = ‘apple’ AND T2.name = ‘Chen’ AND T2.car = ‘Fiat’) With the above the optimiser cannot process the local predicates before the join – so we have to do the join, then filter, negating the index advantages of IQ. So the optimiser now can change the query : SELECT * FROM T1, T2 WHERE T1.key = T2.key AND (T1.nation = ‘Spain’ OR (T1.nation = ‘India’ AND T2.car = ‘Fiat’)) AND T1.food = ‘apple’ AND T2.name = ‘Chen’ Now the optimiser can easily decide that food and name can be processed in the indexes. Predicate Parameters • Force_All_Predicates_To_Postfilters (only in 12.4.2) – This forces IN lists to a post filter, if set, this may not be too useful • dml_options2 (bit field option – beware) – Bit 0 set - ALL IN lists are forced to bitmapped indexes – Bit 1 set - ALL IN lists will be forced to FP indexes – Neither 0 or 1 set - will choose based on the index speed and size of IN list Selectivity - 1 • As we can see from the queries, sometimes the optimiser gets the selectivity wrong – Because it does not have an enumerated index 1-byte FP, 2-byte FP, LF or HG Where col1 < 100 and col1 only has a flat FP and HNG – Because the predicate list is complex Like, substr(), datepart() – Or where there is a correlation between two columns X > 100 and y > 50 Selectivity – 2 • In any of the above cases we can apply a selectivity force Where (x > 200, 22) – This says in the above case 22% of the rows have the value of x > 200 • For like, substr() and datepart() this is wonderful Correlated Columns - 1 • This is the case where two columns are related, but the optimiser cannot know X > 100 and y > 50 (for example x will always be twice as big as y – In this case the columns may both have a (true) selectivity of 10%, so the optimiser assumes no correlation so the overall selectivity is determined to be 1% (10% of 10%) – This may not (in this case is not) correct Correlated Columns - 2 • In this case we can write the forced selectivity as (x > 100, 100) and y > 50 • Not we leave the selectivity on y as 10% (the optimiser determined value • And we change the selectivity on x to 100% • The optimiser then determines the correct selectivity for the table as 10% Sort Elimination Under certain circumstances the optimiser can remove sorts (if the correct ordering applies) 17 17 15 Order A,B 15 14 Order A,B 11 14 11 12 Order A 13 12 Order A,B 13 Agg. Node Types - Vertical • Vertical Grouping Cursor – grouping is done in indexes • Vertical distinct Cursor – a Select Distinct that can be satisfied by an index • Vertical Aggregation Cursor – This will be generated if a query has certain functions in the select list which can be processed vertically (MAX, MIN) Agg. Node Types - Group by 1 • Group By Single • Aggregation Function with no Group By that Must have a Unitary Result Set – E.g. COUNT(*) • For COUNT(distinct), and AVG() and SUM() operations a Hashing function used • For horizontal aggregation – count(distinct, brand) … group by store – Group is performed using sort – Count performed by hash Agg. Node Types - Group by 2 • Explicit group by clauses that are not processed vertically generate a “group by” node • Group by – groups data with a sort • Group By (hash) – groups data with a hashing function Agg. Node Types - Distinct • Select Distinct which cannot be processed vertically • Distinct (hash) Node – Used with Select DISTINCT – Uses a hashing function to find the distinct value • Distinct Node – Used with Select DISTINCT – Uses a Sort to find each distinct value (Non) Sorting Data • A Group By or Select distinct will not sort data by default • Output order may vary on different runs • Order By required to guarantee sorted output – if required Aggregation Parameters - 1 • Aggregation_Preference – – – – – – – Default 0 allow optimiser to choose 1 - prefer aggregation with a sort -1 - avoid sort (if possible) 2 - prefer vertical aggregation -2 - avoid vertical aggregation 3 - prefer aggregation with a hash -3 - avoid aggregation with a hash Aggregation Parameters - 2 • Aggregation_Index_Cutover (default 2000) – If the estimated number of groups is greater than this ASIQM will not choose to group vertically (using an index – usually HG) – Instead the optimiser will use a sort operation to provide the aggregation Query Tree - End
© Copyright 2025 Paperzz