Early Hash Join: A Configurable Algorithm for the Efficient and Early

Early Hash Join: A Configurable
Algorithm for the Efficient and Early
Production of Join Results
Ramon Lawrence
University of Iowa
[email protected]
http://www.cs.uiowa.edu/~rlawrenc/
Introduction
Interactive user querying requires the DBMS produce the first
few query answers quickly as well as minimize the total query
execution time.
Queries that produce a lot of results with large hash joins have
a slow response time as the smaller input must be completely
partitioned before any output can be generated.
It is desirable to have a hash-based join algorithm for
centralized databases that:
Has rapid response time to produce the first few results
Has overall execution time comparable to hybrid hash join
Can be dynamically configured by the optimizer
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 2
Previous Work
Hash joins:
hybrid hash join [DeWitt84] - standard join used in most DBMSs
dynamic hash join [DeWitt95,Nakayama88] - dynamic partitioning
symmetric hash join [Hong93,Wilschut91] - dual hash table
ripple join [Haas99,Luo02] - online aggregation, reading policies
MJoin [Ding03] - purges join state using stream punctuation
Mediator-based joins:
Improve overall execution time by executing during delays
instead of plan re-ordering/query scrambling [Raman99, Urhan98].
double pipelined hash join [Ives99] - Tukwila system
XJoin [Urhan00] - probe in-memory partitions when blocked
hash-merge Join [Mokbel04] - sort-merge partitions when blocked
progressive merge join [Dittrich02] - dual sort-based join
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 3
Motivation
Interactive users of centralized DBMS can benefit from fast
response time inherent in dual-hash table joins.
Challenge is to ensure overall performance is not signficantly
sacrificed for this fast response time.
Dual-hash table join has other benefits as the operator is more
easily pipelined (since it is symmetric).
This is valuable for federated joins when one or more of the
inputs may not be local to the database engine.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 4
Reading Strategy
A reading strategy is the rules an algorithm uses to decide how
to read from the two inputs when both inputs have tuples
available.
Reading strategies do NOT apply to streaming (push-based)
inputs.
They are useful when the inputs are on a local hard drive or a
fast network source (pull-based).
Reading strategies have been used before for processing top-k
queries and in ripple joins.
The reading strategy for hybrid hash join is to read the entire
smaller input then the larger input. Another strategy is to read
alternately from the inputs.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 5
Flushing Policy
The flushing policy determines which tuples in memory are
written to disk when memory must be released to accept new
input.
Previous flushing policies:
Flush the largest single partition (XJoin)
Co-ordinated flushing of a partition pair (Hash-merge join)
Flushing policy affects the duplicate detection strategy of the
join algorithm. Also affects its performance in two ways:
1) Join output rate - The number of results generated as input is
being received. This depends on the tuples in memory.
2) Overall execution time - The total time may change
depending on the cost of flushing and post-join cleanup.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 6
Early Hash Join (EHJ) Algorithm
The Early Hash Join (EHJ) algorithm uses a dual hash table
approach. It is specifically designed for a centralized DBMS
where overall execution time is dictated by the flushing and
partitioning speed and not by the input arrival rates.
EHJ uses:
a variable reading strategy that changes when memory is full
a biased flushing policy to favor the smaller input
optimizations to flush join memory state for 1:* joins
simplified duplicate detection that requires no timestamps for
1:* joins and only one timestamp for *:* joins
a background process when used for mediator joins or with
slow network-based inputs
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 7
Early Hash Join (EHJ) Algorithm
Join complete
Start Join
No
No
Input left?
Initialize 1st
cleanup phase
In phase 1?
Yes
Initialize 2nd
cleanup phase
No
No
Yes
Load R to
memory
On-disk
S no R?
Tuple of R?
Initialize probe
file for S partition
No
Insert in R table
Probe S table
Output results
On-disk
R?
Yes
Read tuple
from R or S
(policy)
Yes
No
Insert in S table
Probe R table
Output results
Read S tuple
TSProbe R table
Output results
Yes
Memory full?
Yes
Close S file.
Delete on-disk
partitions.
No
Input left in
S file?
Bias Flush
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 8
Biased Flushing Policy
The biased flushing policy is designed to keep as much of the
smaller input in memory as possible (similar to hybrid hash join).
Biased flushing policy:
Flush largest non-frozen partition of S (larger input).
If no such partition of S exists, flush smallest, non-frozen
partition of R (smaller input).
Idea of freezing a partition is from dynamic hash join [DeWitt95].
A frozen partition does not accept input once it has been flushed
and is not probed.
XJoin and HMJ do not freeze partitions.
Freezing partitions and using biased flushing simplifies the
duplicate detection strategy.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 9
Duplicate Detection
Duplicate detection is required so that join results are not regenerated during the cleanup pass.
For common 1:* joins, no timestamps are needed:
With a *-side probe tuple, it is discarded if matched.
With a 1-side probe tuple, delete from the hash table any
matching tuples on the *-side.
For *:* joins a single timestamp representing the tuples arrival
order is kept. In cleanup pass, result tuple of (TR,TS) passes
timestamp check (and is output) if one of these is true:
1) TS arrived before its partition of S was flushed and TR arrived
after its corresponding partition of S was flushed.
2) TS arrived after its partition of S was flushed but before the
matching partition of R was flushed and TR arrived after TS.
3) TS arrived after partition of R was flushed.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 10
Performance Analysis
Parameters:
Two input relations R and S with |R|  |S|.
Join memory M where M  |R|. Let f = M / |R|.
Reading policy before memory is full is A1:B1. Let q1=A1/(A1+B1).
Reading policy after memory is full is A2:B2. Let q2=A2/(A2+B2).
Number of I/O operations: (not counting reading inputs)
2 * (| R |  | S |  f * R  f * leftS )
where
 | R |  M * q1 

leftS | S |  M * (1  q1 )  (1  q2 ) * 
q2


Note
for hybrid hash join, leftS = |S|.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 11
Background Process
A background process can be used when the inputs are from
sources other than the hard drive used for flushing.
This includes mediator and federated joins.
As shown in previous work, most valuable for slow or bursty
networks. Not as useful for high speed networks.
Similar to XJoin, use an on-disk partition of S to probe the
matching partition of R currently in memory.
Designed as a background process that runs concurrently with
main join process. This can boost join output rate, but still must
be careful not to needlessly tie up CPU when background
process may only generate a few results.
Duplicate detection is slightly modified when using BG process.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 12
Experimental Evaluation
The performance of early hash join was compared with dynamic
hash join, XJoin, and hash-merge join.
All algorithms were implemented in Java and tested on a TPC-H
1 GB size data set (raw text files). All dual hash table algorithms
used the same table structure.
Summary of results:
EHJ is 10-35% faster than HMJ/XJoin for many-to-many joins
and 25-75% faster for one-to-many joins.
EHJ is faster over all memory sizes except for very small
memory (less than 10% of smaller relation size).
EHJ performs better when the difference in the relative sizes of
the relations is large.
EHJ is within 10% of overall time of DHJ, but with a response
time that is an order of magnitude faster. Intelligent buffering
may be able to further reduce this difference.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 13
Many-to-Many Join Experiment
60
180
Join Output by Time
I/Os Performed
160
50
140
120
I/Os * 1000
Time (sec)
40
30
100
80
60
20
40
10
DHJ
EHJ1
HMJ
XJoin
EHJ2
20
DHJ
EHJ1
HMJ
XJoin
EHJ2
0
0
10
300
700
1100
1500
1900
Results *1000
2300
2700
3100
10
300
700
1100
1500
1900
2300
2700
3100
Results *1000
Query:
SELECT * FROM PartSupp P1, PartSupp P2
WHERE P1.p_partkey = P2.p_partkey
P1 and P2 were randomly permuted as sorted on p_partkey.
Memory size = 300,000 tuples (37.5% of 800,000 tuples)
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 14
One-to-Many Join Experiment
70
180
I/Os Performed
Join Output by Time
160
60
140
50
I/Os * 1000
Time (sec)
120
40
30
100
80
60
20
40
10
DHJ
EHJ1
HMJ
XJoin
EHJ2
20
DHJ
EHJ1
HMJ
XJoin
EHJ2
0
0
10
600
Results *1000
1500
10
300
700
1100
1500
Results *1000
Query:
SELECT * FROM Customer C, Orders O
WHERE C.c_custkey = O.o_custkey
Memory size = 75,000 tuples (50% of 150,000 tuples)
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 15
Multi-Join Experiment
700
400
Join Output by Time
I/Os Performed
350
600
300
500
I/Os * 1000
Time (sec)
250
200
400
300
150
200
100
50
DHJ
EHJ1
HMJ
XJoin
EHJ2
100
DHJ
EHJ1
HMJ
XJoin
EHJ2
0
0
10
400
1200
2000
2800
3600
Results *1000
4400
5200
6000
10
400
1200
2000
2800
3600
4400
5200
6000
Results *1000
SELECT c_custkey, c_name, c_address, o_orderkey, o_custkey,
o_totalprice, o_orderdate, l_orderkey, l_partkey, l_suppkey,
l_quantity, l_extendedprice FROM Customer C, Orders O, LineItem LI
WHERE C.c_custkey=O.o_custkey and O.o_orderkey = LI.l_orderkey
Memory
size = 90,000 tuples (60% of 150,000 tuples) (C+O)
Memory size = 450,000 tuples (30% of 1,500,000) (C/O + LI)
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 16
Mediator Experimental Evaluation
The performance of early hash join was compared with dynamic
hash join, XJoin, and hash-merge join for mediator joins.
All algorithms were implemented in Java and tested on a TPC-H
100 MB size data set with queries processed by SQL Server.
DHJ downloaded from both inputs in parallel.
Summary of results:
Both overall execution time and join output rate is dictated by
speed of inputs. Little variation in execution time for algorithms.
All early algorithms have response time an order of magnitude
faster than DHJ especially when left input is slow.
For 1:* joins, EHJ is 5%-15% faster overall than HMJ/XJoin and
equivalent to DHJ. It also has a slightly faster join output rate.
For *:* joins, EHJ was only marginally faster in overall time with a
very similar join output rate.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 17
Applications
The primary application is interactive querying on a centralized
database. EHJ has a response time an order of magnitude
faster than hybrid hash join with little execution overhead.
EHJ is also more suitable for pipelining within and outside
DBMS as it is a symmetric operator that tolerates source
delays. This may be especially valuable for federate queries.
EHJ can be used with LIMIT queries to produce the first few
results without the overhead of partitioning the smaller input.
However, any query with blocking operators such as ordering
and grouping cannot benefit from its fast response time.
Further, it is not order preserving without additional
modifications.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 18
Future Work and Conclusions
EHJ is a useful algorithm for interactive querying and a good
candidate for inclusion into the set of join algorithms for a
centralized DBMS.
EHJ is dynamically configurable using a reading policy and can
adapt to slow input arrival. In a centralized environment, it
significantly outperforms previous early join algorithms.
Future work:
Implement and test performance of EHJ in PostgreSQL.
Expand algorithm for a N-way join.
Investigate possibility of making order preserving and optimized
for distributed/mediator joins.
Ramon Lawrence Early Hash Join: A Configurable Algorithm for the
Efficient and Early Production of Join Results
The University of Iowa.
Copyright© 2005
Page 19
Early Hash Join: A Configurable
Algorithm for the Efficient and Early
Production of Join Results
Ramon Lawrence
University of Iowa
[email protected]
http://www.cs.uiowa.edu/~rlawrenc/
Thank You!