Teradata Join Processing and Explain Facility

Teradata Join Processing
Center of Excellence
Data Warehousing
Wipro Technologies
Join Processing
 Rows to be joined must be on the same AMP.
 For join processing, copies of some or all of
the rows may have to be moved to a common
AMP.
 Join plans
 Product join.
 Merge join
 Nested join
Join Processing
 General scenarios:

Join column is the PI of both the tables.

Join column is PI of one of the tables.

Join column is not a PI of either of the table.
Case 1- PI of both the tables
 Rows taking part in the join are already in the
same AMP.
 No data movement is necessary.
 Rows are already in sorted order (within the
block)
 This is the best case scenario.
Case 2 - PI of one of the tables
 One table has its rows on the target AMP.
 Rows of the other table need to be
redistributed to their target AMPs by the hash
code of the join column value.
 If the table is small optimizer may choose to
duplicate the table on all AMPs
Case 3 - not a PI of either of the table
 Rows of both the tables need to redistributed
to their target AMPs by the hash code of the
join column value.
 Optimizer might choose to duplicate the
smaller table on all AMPs.
 This join scenario involves maximum number
of data movement.
Nested Join
 Optimizer choose this join strategy when


An equality value for a unique index (UPI or
USI) on table 1.
A join on a column of that single row to any
index on table 2.
 This joining uses minimum system resource
data value
UPI , data column
PI
=
2 AMPs
1 OR MORE ROWS RETURNED
data value
USI , data column
PI
=
3 AMPs
1 OR MORE ROWS RETURNED
data value
UPI , data column
USI
=
3 AMPs
1 ROW RETURNED
data value
USI , data column
USI
=
4 AMPs
1 ROW RETURNED
data value
UPI , data column
NUSI =
ALL AMPs 1 OR MORE ROWS RETURNED
data value
USI , data column
NUSI =
ALL AMPs 1 OR MORE ROWS RETURNED
Product Join
 Most general for of join
 Optimizer chooses product join in following conditions
 WHERE clause is missing.
 Join condition is not based on equality condition.
 Join conditions are ORed together.
 Table alias are incorrectly used.
 Optimizer determines that it is less expensive than
other join types.
 Identify the smaller table duplicate it in spool on all
AMPs. Join each spool row of the smaller table to
every row of the larger table.
Merge Join
 Commonly done when the join conditions are based on equality.
 Generally more efficient than Product Join as number of row
comparisons are less.
 Steps





Identify the smaller table.
Put the qualifying rows from one or both table into spool.
Move the spool rows to the AMPs based on join column
hash (if required).
Sort the spool rows by join column hash value (if necessary).
Compare those rows with matching join column hash values.
Merge Join
Row
Hash
Col1
Row
Hash
110A
110A
110A
120B
111B
203C
111B
210D
203C
203C
203C
110E
Col1
Col2….
Example
Table 1
Table 2
Col1
(PK)
Col2
Col3
(FK)
Col1
(PK)
Col2……
100
P
600
100
K
200
Q
600
200
L
300
R
700
300
M
400
S
200
400
N
500
T
500
500
O
600
X
200
600
P
700
Y
300
700
Q
800
Z
500
800
R
900
A
800
1000
B
300
2000
C
300
3000
D
300
4000
E
200
Example
100 P 600
800 Z 500
1000 B 300
400 S 200
700 Y 300
2000 C 300
4000 E 200
200 Q 600
500 T 500
3000 D 300
300 R 700
600 X 200
900 A 800
100 K
800 R
400 N
700 Q
200 L
500 O
300 M
600 P
Row Distribution Strategy 1
 No distribution needed.
 No sorting needed.
 Join columns of both the tables are PIs.

Rows involved in the join are located in the
same AMP.
Case 1 - Example
SELECT * FROM Table1 t1
INNER JOIN
Table2 t2
ON t1.Col1 = t2.Col1
100 P 600
800 Z 500
1000 B 300
400 S 200
700 Y 300
2000 C 300
4000 E 200
200 Q 600
500 T 500
3000 D 300
300 R 700
600 X 200
900 A 800
100 K
800 R
400 N
700 Q
200 L
500 O
300 M
600 P
Row Distribution Strategy 2
 Distributing and sorting one of the table on
join column row hash.
 Join column is PI of one of the tables.

One of the tables is already distributed on join
Column Row Hash.

Optimizer redistributes one of the tables and
sort on join column row hash.
Case 2 – Example
SELECT * FROM Table1 t1
INNER JOIN
Table2 t2
ON t1.Col3 = t2.Col1
100 P 600
800 Z 500
1000 B 300
400 S 200
700 Y 300
2000 C 300
4000 E 200
200 Q 600
500 T 500
3000 D 300
300 R 700
600 X 200
900 A 800
100 K
800 R
400 N
700 Q
200 L
500 O
300 M
600 P
600 X 200
400 S 200
4000 E 200
800 Z 500
500 T 500
1000 B 300
3000 D 300
700 Y 300
2000 C 300
200 Q 600
100 P 600
900 A 800
100 K
800 R
300 R 700
400 N
700 Q
200 L
500 O
300 M
600 P
S
P
O
O
L
Row Distribution Strategy 3
 Duplicating and sorting the smaller table on
all AMPs and locally building the larger table
and sorting it.
 Optimizer considers this strategy if it finds
redistributing a larger table is more expensive
than duplicating a the smaller table.
Case 2 – Example
100 P 600
800 Z 500
1000 B 300
400 S 200
700 Y 300
2000 C 300
4000 E 200
200 Q 600
500 T 500
3000 D 300
300 R 700
600 X 200
900 A 800
100 K
800 R
400 N
700 Q
200 L
500 O
300 M
600 P
1000 B 300
100 P 600
800 Z 500
400 S 200
4000 E 200
700 Y 300
2000 C 300
3000 D 300
500 T 500
200 Q 600
600 X 200
300 R 700
900 A 800
100
200
300
400
500
600
700
800
100
200
300
400
500
600
700
800
100
200
300
400
500
600
700
800
100
200
300
400
500
600
700
800
K
L
M
N
O
P
Q
R
K
L
M
N
O
P
Q
R
K
L
M
N
O
P
Q
R
K
L
M
N
O
P
Q
R
S
P
O
O
L
Row Distribution Strategy 4
 Duplicate the smaller table on every AMP.
 Optimizer chooses this strategy the join
condition is not based on equality.
 Product join scenario.
Explain Facility
 Provides an English translation of the steps
chosen by the optimizer.
 Very helpful to estimate the performance of
complex queries.
 Helps physical designers in their index
selection by providing the execution strategy
chosen by the optimizer.
Explaining the EXPLAIN
 Generally EXPLAIN outputs are clear and easy to
understand however it contains few phrases one
needs to be familiar with.






“….with no residual conditions…” : There is no residual
conditions other than the conditions used locate the
row.
“..eliminating duplicates..” : DISTINCT operation being
done.
“…we do a SMS…” : Set manipulations like UNION,
EXCEPT are being done.
“…we do a BMSMS…” : NUSI Bit mapping being used.
“…distributed by hash code to all AMPs…”
“…duplicated on all AMPs…”
Statistics
 Optimizer needs demographic information to create best
execution plan for a query.
 Number of rows in the table.
 Row size.
 Number of rows per value.
 Index information and demographics.
 Based on the statistics optimizer estimates the cost and creates
the best plan.
 Statistics must be collected for the columns and indexes being
accessed frequently.
 If Statistics are not provided, optimizer does Dynamic Sampling
(Random AMP).
Questions ?