(Semi Join) 3) informix.z1

Conference Highlights
More than 700 technical sessions
More than 110 hands-on labs
Industry-focused business and IT leadership sessions
Approximately 300 client and Business Partner speakers
IBM’s largest EXPO – 350+ exhibits
Visit: http://www-01.ibm.com/software/data/2013-conference/
1
Upcoming Events
IBM Infobahn
•
•
•
•
•
•
New Jersey
9/5/2013
Los Angeles
9/12/2013
Washington DC
10/10/2013
Dallas
10/17/2013
Chicago
10/22/2013
Atlanta
11/11/2013
For more information, please contact Anita
McKeithen ([email protected])
2
Upcoming Events
Local User Group Meetings
3
Query Optimizer Enhancement in Informix 12.1
Bingjie Miao
IBM
4
© 2013 IBM Corporation
Agenda
•
•
•
•
•
•
•
•
•
•
sqexplain overview
Set operations
View folding enhancements
Subquery flattening after view folding
ANSI OUTER JOIN to informix outer join
transformation
Hash join support for ANSI JOIN queries
Optimizer costing enhancement for hash join
Temp table optimization
Better handling of date() wrapper
Predicate derivation for ANSI JOIN query
5
sqexplain Overview
• Print out query plan information
• Includes runtime statistics
• Ways to turn on explain
– set explain on;
– set explain file to ‘file_name’;
– set explain on avoid_execute;
– EXPLAIN directive on a query
6
Sections in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-07-2013 17:28:30)
-----select {+ FULL(tab1) AVOID_FULL(tab2)} *
from tab1, tab2 where tab1.id = tab2.id
DIRECTIVES FOLLOWED:
FULL ( tab1 )
AVOID_FULL ( tab2 )
DIRECTIVES NOT FOLLOWED:
query text
general query
information
Estimated Cost: 4
Estimated # of Rows Returned: 1
1) informix.tab1: SEQUENTIAL SCAN
2) informix.tab2: INDEX PATH
(1) Index Name: informix.t2idx1
Index Keys: id
(Serial, fragments: ALL)
Lower Index Filter: informix.tab1.id = informix.tab2.id
NESTED LOOP JOIN
access paths
and joins
7
Sections in sqexplain – cont.
Query statistics:
----------------Table map :
---------------------------Internal name
Table name
---------------------------t1
tab1
t2
tab2
Runtime query statistics
type
table rows_prod est_rows rows_scan time
est_cost
------------------------------------------------------------------scan
t1
1000
1
1000
00:00.00
2
type
table rows_prod est_rows rows_scan time
est_cost
------------------------------------------------------------------scan
t2
1000
1
1000
00:00.01
0
type
rows_prod est_rows time
est_cost
------------------------------------------------nljoin
1000
1
00:00.01
4
8
sqexplain for ANSI JOIN query
QUERY:
-----Select * from (t1 left outer join
(t2 left outer join t3 on t2.c1=t3.c1) on t1.c2=t2.c2 and t2.c1 < 5)
left outer join t4 on t1.c1=t4.c1
Estimated Cost: 14
Estimated # of Rows Returned: 4
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
Filters: informix.t2.c1 < 5
(1) Index Keys: c2
(Serial, fragments: ALL)
Lower Index Filter: informix.t1.c2 = informix.t2.c2
3) informix.t3: AUTOINDEX PATH
(1) Index Keys: c1
Lower Index Filter: informix.t2.c1 = informix.t3.c1
ON-Filters:informix.t2.c1 = informix.t3.c1
NESTED LOOP JOIN(LEFT OUTER JOIN)
ON-Filters:(informix.t1.c2 = informix.t2.c2 AND informix.t2.c1 < 5 )
NESTED LOOP JOIN(LEFT OUTER JOIN)
4) sqlqa.t4: AUTOINDEX PATH
(1) Index Keys: c1
Lower Index Filter: informix.t1.c1 = informix.t4.c1
ON-Filters:informix.t1.c1 = informix.t4.c1
NESTED LOOP JOIN(LEFT OUTER JOIN)
9
Set Operations
• Similar to UNION
• INTERSECT – rows common to both arms
– internally transformed into EXISTS subquery with
special NULL handling
• MINUS or EXCEPT – rows in first arm that’s not
in second arm
– internally transformed into NOT EXISTS subquery
with special NULL handling
10
Set Operations in explain
QUERY: (OPTIMIZATION TIMESTAMP: 03-08-2013 15:04:22)
-----select intcol from tab1
intersect
select intcol2 from tab2
Estimated Cost: 4
Estimated # of Rows Returned: 1
1) informix.tab1: SEQUENTIAL SCAN
2) informix.tab2: SEQUENTIAL SCAN
(First Row)
Filters: informix.tab1.intcol == informix.tab2.intcol2
NESTED LOOP JOIN (Semi Join)
11
Set Operations in explain – cont.
QUERY: (OPTIMIZATION TIMESTAMP: 03-08-2013 15:13:28)
-----select intcol, charcol from tab1
intersect
select intcol2, charcol2 from tab2
minus
select intcol3, charcol3 from tab3
Estimated Cost: 6
Estimated # of Rows Returned: 1
1) informix.tab1: SEQUENTIAL SCAN
2) informix.tab2: SEQUENTIAL SCAN
(First Row)
Filters: (informix.tab1.intcol == informix.tab2.intcol2 AND
informix.tab1.charcol == informix.tab2.charcol2 )
NESTED LOOP JOIN (Semi Join)
3) informix.tab3: SEQUENTIAL SCAN
(First Row)
Filters: (informix.tab1.charcol == informix.tab3.charcol3 AND
informix.tab1.intcol == informix.tab3.intcol3 )
NESTED LOOP JOIN (Anti Semi Join)
12
View folding enhancement
• Views containing ANSI JOIN or informix outer
join can now be folded into main query, for
better performance
– view must be referenced as a dominant table in
the main query
– if view is used as subservient table, then the view
still needs to be materialized first
13
View folding Example
create view v1(vc1, vc2) as
select t1.c, t2.c
from t1 left join t2 on t2.a = t1.a;
select *
from v1 left join t3 on v1.vc1 = t3.a;
select *
from v1 right join t3 on v1.vc1 = t3.a;
14
View folding in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 16:23:41)
-----select * from v1 left join t3 on v1.vc1 = t3.a
Estimated Cost: 6
Estimated # of Rows Returned: 3
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
(1) Index Name: informix.ind2
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t2.a = informix.t1.a
NESTED LOOP JOIN
3) informix.t3: INDEX PATH
(1) Index Name: informix.ind3
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t1.c = informix.t3.a
NESTED LOOP JOIN
15
View folding in sqexplain – cont.
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 16:28:47)
-----create view "informix".v1 (vc1,vc2) as select x0.c ,x1.c from ("informix".t1 x0
left join "informix".t2 x1 on (x1.a = x0.a ) );
Estimated Cost: 4
Estimated # of Rows Returned: 3
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
(1) Index Name: informix.ind2
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t2.a = informix.t1.a
NESTED LOOP JOIN
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 16:28:47)
-----select * from v1 right join t3 on v1.vc1 = t3.a
Estimated Cost: 5
Estimated # of Rows Returned: 3
1) informix.t3: SEQUENTIAL SCAN
2) (Temp Table For View): SEQUENTIAL SCAN
DYNAMIC HASH JOIN
Dynamic Hash Filters: (Temp Table For View).vc1 = informix.t3.a
16
Subquery flattening after view folding
• Subquery flattening improves query
performance, however previously it is disabled
if query contains view or derived table
reference
• In 12.1 subquery flattening is attempted again
after view folding process, and can be done
with the view either folded, or materialized
into temp table
17
Subquery flattening after view folding
in sqexplain
create view v4 (v4_c1, v4_c2) as
select t1_c1 + 1, MAX(1) from t1 group by 1;
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 16:52:44)
-----select 1 from v4 where exists (select 1 from t2 where t2_c1 = v4_c1)
Estimated Cost: 6
Estimated # of Rows Returned: 1
Temporary Files Required For: Group By
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: SEQUENTIAL SCAN
(First Row)
Filters: informix.t2.t2_c1 = informix.t1.t1_c1 + 1
NESTED LOOP JOIN (Semi Join)
18
Subquery flattening after view folding
in sqexplain – cont.
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 17:02:37)
-----select z.a from z
where z.b = some (select v.a from vag1 v, z z1 where v.a > z.a)
Estimated Cost: 9
Estimated # of Rows Returned: 11
1) informix.z: SEQUENTIAL SCAN
Filters: informix.z.b > informix.z.a
2) (Temp Table For View): AUTOINDEX PATH
(First Row)
(1) Index Name: (Auto Index)
Index Keys: a
(Key-Only)
Lower Index Filter: informix.z.b = (Temp Table For View).a
NESTED LOOP JOIN (Semi Join)
3) informix.z1: SEQUENTIAL SCAN
NESTED LOOP JOIN (Semi Join)
(First Row)
19
ANSI OUTER JOIN to Informix Outer
Join Transformation
• “Simple” ANSI OUTER JOIN can be converted
to informix outer join
– potentially more join choices by the optimizer
– ON clause filters must be of the type “col = col”
involving the current join
– WHERE clause filters cannot reference subservient
tables, flattened subquery tables, correlated
subqueries or UDR references
• If one join is not transformed, then entire
query is not transformed
20
ANSI OUTER JOIN to Informix Outer
Join in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 17:23:32)
-----select * from t1 left join t2 on t1.a = t2.a
Estimated Cost: 4
Estimated # of Rows Returned: 3
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
(1) Index Name: informix.ind2
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t1.a = informix.t2.a
NESTED LOOP JOIN
21
ANSI OUTER JOIN to Informix Outer
Join in sqexplain – cont.
QUERY: (OPTIMIZATION TIMESTAMP: 03-12-2013 17:24:01)
-----select * from t1 left join t2 on t1.a = t2.a and t1.a = 1
Estimated Cost: 4
Estimated # of Rows Returned: 3
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
(1) Index Name: informix.ind2
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t1.a = informix.t2.a
ON-Filters:(informix.t1.a = informix.t2.a AND informix.t1.a = 1 )
NESTED LOOP JOIN(LEFT OUTER JOIN)
22
Hash Join Support in ANSI JOIN
• Hash join is supported in ANSI JOIN queries
• Optimizer can consider and choose best join
method for each join – hash join can be faster
for large joins
• Optimizer costing is adjusted for situation
where build/probe sides for hash join can be
composite
23
Hash Join for ANSI JOIN in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-14-2013 15:01:22)
-----select * from (t1 left join t2 on t1.a = t2.a )
left join (t3 inner join t4 on t3.a = t4.a) on t4.a = t1.a
Estimated Cost: 9
Estimated # of Rows Returned: 3
1) informix.t1: SEQUENTIAL SCAN
2) informix.t2: INDEX PATH
(1) Index Name: informix.ind2
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t1.a = informix.t2.a
ON-Filters:informix.t1.a = informix.t2.a
NESTED LOOP JOIN(LEFT OUTER JOIN)
3) informix.t3: SEQUENTIAL SCAN
4) informix.t4: INDEX PATH
(1) Index Name: informix.ind4
Index Keys: a
(Serial, fragments: ALL)
Lower Index Filter: informix.t3.a = informix.t4.a
ON-Filters:informix.t3.a = informix.t4.a
NESTED LOOP JOIN
ON-Filters:informix.t4.a = informix.t1.a
DYNAMIC HASH JOIN (LEFT OUTER JOIN)
Dynamic Hash Filters: informix.t4.a = informix.t1.a
24
Optimizer costing improvements
• Current optimizer costing tends to favor index
based scans and joins, which can be
problematic for large tables
• In 12.1, introduced costing modifications to
make hash join more favorable for large tables
• Under control of undocumented ONCONFIG
parameter SQL_DEF_CTRL (off by default)
– add 0x200 and 0x800 bits
– set to 0xeb0 to include “on-by-default” bits
25
Optimizer costing example
SELECT
SQL_DEF_CTRL=0x4b0
dm.dm_s_symb AS stock,
MONTH(dm.dm_date) AS month,
COUNT(*) AS count_num_days,
MAX(dm.dm_vol) AS max_vol
FROM daily_market dm, security s
WHERE
dm.dm_s_symb = s.s_symb AND
YEAR(dm.dm_date) = "2001"
GROUP BY 1,2
Estimated Cost: 5018350
Estimated # of Rows Returned: 1779368
Temporary Files Required For: Group By
1) informix.s: INDEX PATH
(1) Index Name: informix.pk_security
Index Keys: s_symb
(Key-Only)
(Serial, fragments: ALL)
2) informix.dm: INDEX PATH
Filters: YEAR
(informix.dm.dm_date ) = 2001
(1) Index Name:
informix.fk_daily_market_security
Index Keys: dm_s_symb
(Serial,
fragments: ALL)
Lower Index Filter:
informix.dm.dm_s_symb =
informix.s.s_symb
SELECT
SQL_DEF_CTRL=0xeb0
dm.dm_s_symb AS stock,
MONTH(dm.dm_date) AS month,
COUNT(*) AS count_num_days,
MAX(dm.dm_vol) AS max_vol
FROM daily_market dm, security s
WHERE
dm.dm_s_symb = s.s_symb AND
YEAR(dm.dm_date) = "2001"
GROUP BY 1,2
Estimated Cost: 4183207
Estimated # of Rows Returned: 1779368
Temporary Files Required For: Group By
1) informix.dm: SEQUENTIAL SCAN
Filters: YEAR
(informix.dm.dm_date ) = 2001
2) informix.s: INDEX PATH
(1) Index Name: informix.pk_security
Index Keys: s_symb
(Key-Only)
(Serial, fragments: ALL)
DYNAMIC HASH JOIN
Dynamic Hash Filters:
infomix.dm.dm_s_symb =
informix.s.s_symb
NESTED LOOP JOIN
26
Temp table optimization
• Temp tables are created when a view or
derived table cannot be folded into main
query
• Previously when a temp table is created, it
includes all columns from underlying tables
• In 12.1, a temp table only includes columns
that are required in the query
– smaller temp table
– more efficient query processing
27
Temp table optimization example
select rtrim(D12.C36), rtrim(D12.C48), D12.C103, D12.C104
from ( select stock_trans_type.stt_type
as C0,
stock_trans_type.stt_desc
as C1,
stock_movements.stk_trans_type as C2,
......
stock_master.user_num1
as C103,
system_table.systbl_code
as C104
from stock_trans_type, stock_master, system_table,
outer stock_movements
where ......
) D12
right outer join system_type
on D12.C29 = system_type.type_id
where D12.C12 between 129 and 256
and D12.C16 is not null;
TEMP table for D12 contains only the following columns:
C12, C16, C29, C36, C48, C103, C104
28
date() wrapper
• It’s not uncommon to use a date() wrapper on
a datetime column
date(datetime) between ’01/01/2012’ and ’01/31/2012’
• In 12.1, index access in enabled if index exists
on datetime column
• Optimizer estimates for such predicate is also
improved
29
date() wrapper in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-14-2013 12:24:04)
-----select c1, c2, c3 from tab1
where c4 = 40974 and date(c2) = '12/29/2010'
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) informix.tab1: INDEX PATH
Filters: DATE (informix.tab1.c2 ) = 12/29/2010
(1) Index Name: informix.t1i1
Index Keys: c4 c2
(Serial, fragments: ALL)
Lower Index Filter: (informix.tab1.c4 = 40974 AND informix.tab1.c2
>= EXTEND (12/29/2010 ,year to second) )
Upper Index Filter: informix.tab1.c2 < EXTEND (12/29/2010 ,year to
second) + interval( 1) day to day
30
date() wrapper in sqexplain – cont.
QUERY: (OPTIMIZATION TIMESTAMP: 03-14-2013 12:25:47)
-----select c1, c2, c3 from tab1
where c4 = 17114 and date(c2) <= '08/09/2007'
Estimated Cost: 1
Estimated # of Rows Returned: 1
1) informix.tab1: INDEX PATH
Filters: DATE (informix.tab1.c2 ) <= 08/09/2007
(1) Index Name: informix.t1i1
Index Keys: c4 c2
(Serial, fragments: ALL)
Lower Index Filter: informix.tab1.c4 = 17114
Upper Index Filter: informix.tab1.c2 < EXTEND (08/09/2007 ,year to
second) + interval( 1) day to day
31
Predicate derivation for
ANSI JOIN Query
• Optimizer is able to derive predicates based
on existing predicates
– t1.c1 = t2.c2 and t1.c1 = t3.c3  t2.c2 = t3.c3
– t1.c1 = t2.c2 and t1.c1 >= 5  t2.c2 >= 5
• Predicate derivation is now enabled for ANSI
JOIN query as well (among dominant tables)
32
Predicate derivation for ANSI JOIN
in sqexplain
QUERY: (OPTIMIZATION TIMESTAMP: 03-15-2013 12:17:32)
-----select int1, value1, word1, int3, int4, value4
from aoj1 left join (aoj3 left join aoj4 on value3 = value4)
on (value1 = value3 and int1 = int3)
where value3 > 15
Estimated Cost: 6
Estimated # of Rows Returned: 1
1) informix.aoj1: INDEX PATH
(1) Index Name: informix.aoj1_value1
Index Keys: value1
(Serial, fragments: ALL)
Lower Index Filter: informix.aoj1.value1 > 15
2) informix.aoj3: AUTOINDEX PATH
(1) Index Name: (Auto Index)
Index Keys: value3 int3
(Key-Only)
Lower Index Filter: (informix.aoj1.value1 = informix.aoj3.value3
AND informix.aoj1.int1 = informix.aoj3.int3 )
Index Key Filters: (informix.aoj3.value3 > 15 )
3) informix.aoj4: SEQUENTIAL SCAN
ON-Filters:informix.aoj3.value3 = informix.aoj4.value4
DYNAMIC HASH JOIN (LEFT OUTER JOIN)
Dynamic Hash Filters: informix.aoj3.value3 = informix.aoj4.value4
ON-Filters:(informix.aoj1.value1 = informix.aoj3.value3 AND informix.aoj1.in
t1 = informix.aoj3.int3 )
NESTED LOOP JOIN
33
Summary
•
•
•
•
•
•
•
•
•
•
sqexplain overview
Set operations
View folding enhancements
Subquery flattening after view folding
ANSI OUTER JOIN to informix outer join
transformation
Hash join support for ANSI JOIN queries
Optimizer costing enhancement for hash join
Temp table optimization
Better handling of date() wrapper
Predicate derivation for ANSI JOIN query
34