Parallelizing K

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
Parallelizing K-Way Merging
Hazem M Bahig 1,2 and Ahmed Y Khedr 1,3
1
College of Computer Science and Engineering, Hail University, Hail, KSA.
Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt.
3
Systems and Computer Department, Faculty of Engineering, Al-Azhar University, Cairo, Egypt.
2
Abstract— The k-way merging problem is to find a new sorted array as an output from k sorted arrays as an input. In this
paper, we consider the elements of the k sorted arrays are data record, where the value of the key for each record is a
serial number. The problem is used to design efficient external sorting algorithm. We proposed two optimal parallel
algorithms for k merging. The first one is based on merging k sorted arrays of n records in a new sorted array of length n.
The second one is based on merging k sorted arrays of n records in a new sorted array of length n+o(n) which is called
padded merging. The running time for each algorithm is O(log n) and O(1) under EREW and CRCW PRAM respectively.
Keywords- merging; k-merging; padded merging; PRAM; optimal algorithm; parallel algorithm.
I.
INTRODUCTION
Given k sorted arrays of total size n as an input. The k-way merging problem is to produce a single new sorted array, A,
contains all the elements of the input. In case of k=2, the problem is called the binary merging problem or merging problem.
In general, the merge problem plays an important step in solving many applications in the field of computer science such as
sort, reconstruction of large phylogenetic trees, and database management systems [5][13][16][18]. One of these important
applications is the merge sort algorithm. In merge sort algorithm, we divide the original array into two equal size subraarys and
then sort each subarray recursively. After that, we merge the two sorted subarrays.
In many applications, the data to be sorted is too large and therefore we cannot fit the data in the internal memory. In this case,
the data will be store in the external storage, such as a hard disk. But the performance of the optimal merge-sort algorithm is not
well in case of the data stored in external storage. Because the reading and writing from and to the external storage is very slow.
In this case, the k-way merging algorithm is an efficient technique to sort the data in the external storage and the sorting problem
is called external sorting.
The merging problem has been studied by many researchers on sequential and parallel platforms. The summary of these
researches is given in Table 1. In this summary, we focused only on the shared memory model especially parallel random access
machine. In the table we use p to represents the number of processors. We also use two terms, work and cost. The work of the
algorithm is the total number of operations done by all processors, while the cost of the algorithm is the product of running time
and the number of processors. Also we use α(n) to represents the inverse of Ackermann’s function.
From the table we observe the following for the algorithm under PRAM.
1. We can merge two sorted array in constant time in some special cases and p=n as in[3][4].
2. The optimal merging algorithm without any restrictions on the input has running time O(log n) and O(log log n) under
EREW and CREW respectively.
3. The optimal work merging algorithm for integer numbers has running time O(log log n+ log Min{n,m}) and O(α(n))
under EREW and CREW respectively, where m is the domain of integer, [1,m].
4. The optimal work k merging algorithm has running time Ω(log n) and Ω(log log n + log k) under EREW and CREW
respectively.
In this paper, we study the k merging problem on PRAM. In some applications, such as external sorting, the elements of the k
sorted arrays are records and the records are sorted according to the primary key. We proposed two k merging algorithms under
EREW and CRCW PRAM. The first algorithm merges the k sorted arrays of size n in a new sorted array of size n. The second
497
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
algorithm merges the k sorted arrays of size n in a new array of size n+o(n). In case of EREW PRAM, the algorithm runs in
logarithmic time, while the algorithm runs in constant time in case of CRCW PRAM.
Table 1: Comparison between merging algorithms
Ref.
p
Model
Time
Work
Cost
Comments
[12] 2 sorted arrays
1
Sequential
O(n)
O(n)
O(n)
---
[10] k sorted arrays
n/log n
EREW PRAM
O(log n)
O(n)
O(n)
---
[14] 2 sorted arrays
n/loglog n
CREW PRAM
O(log log n)
O(n)
O(n)
---
O(n)
O(n log log n)
Integers
O(n)
O(n)
Special
case
O(log log log n)
O(n)
O(n)
Integers
[9]
Input
2 sorted arrays
n
EREW PRAM O(log log n+ log Min{n,m})
[3][4] 2 sorted arrays
p
EREW PRAM
O(n/p)
[6]
2 sorted arrays n/logloglog m CREW PRAM
[6]
2 sorted arrays
n/α(n)
CREW PRAM
O(α(n))
O(n)
O(n)
Integers
[10] k sorted arrays
n/log n
EREW PRAM
O(log n log k)
O(n log k)
O(n log k)
Integers
[17] k sorted arrays (n log k)/log n CREW PRAM
O(log n)
O(n log k)
O(n log k)
Integers
[11] k sorted arrays
n
EREW PRAM
Ω(log n)
O(n log k)
O(n log n)
---
[11] k sorted arrays
n
CREW PRAM
Ω(log log n + log k)
O(n log k)
O(n log n)
---
The research paper consists of an introduction and four sections. In Section II, we give the definition of problem and the model
of computation used. In Section III, we describe the main idea, steps, and the complexity analysis of the proposed algorithm under
EREW and CREW PRAM. In Section IV, we extend the domain of the primary key and then modified the algorithm. Finally in
Section V, we show the conclusion of our work.
II.
PRIMELIARY
In this section, we give a brief describtion about the parallel mdoel used in desiging the algorithm and the complete describtion
of our problem.
A. Parallel Random Access Machine
A Parallel Random Access Machine, PRAM, is the natural extension of the universal model of sequential machine Random
Access Machine, RAM. Also, the model is a type of shared memory Single Instruct Multi Instruction, SIMD. It consists of p
identical RAM processors and large M shared memory cells. The p processors operate synchronously and communicate through
the shared memory. Each processor pi may be execute (i) read from a shared memory cell, (ii) write to a shared memory cell, and
(iii) local computation.
Due to the memory access conflicts in shared memory for reading and writing, three realistic mechanisms are proposed.
•
Exclusive Read Exclusive Write (EREW) PRAM: no simultaneous read or write by two or more processors from or to
the same memory cell location.
•
Concurrent Read Exclusive Write (CREW) PRAM: simultaneous reads of the same memory cell by two or more
processors allowed, but no simultaneous writes by two or more processors to the same memory cell location.
•
Concurrent Read Concurrent Write (CRCW) PRAM: simultaneous reads or writes from or to the same memory cell by
two or more processors allowed.
498
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
In CRCW, different submodels are proposed to illustrate the mechanisms of CW. In our proposed algorithm we use the
Common CRCW. In a Common CRCW PRAM, concurrent writes are allowed only if all processors have the same value at the
same time.
B. Problem Formulation and Related Problem
We can formulate the problem of k merging records of serial numbers as follows.
Given k sorted arrays of data records, Ri=(ri0, r11,…, ri ni-1), 0≤i<k such that: (1) the elements of the array Ri is sorted based on
the field “key”. I.e ri j • key < ri j+1 • key, 0 ≤ j< ni-1 and 0 ≤ i< k-1. (2) The values of keys in all k arrays are serial number. (3) The
total number of records is n=n1 + n2 +…+nk. The output of the k-way merging is a new sorted array of records R=(r0, r1,…, rn-1)
such that ri • key < ri+1 • key, 0 ≤ i< n-1.
In our proposal algorithm, we need the problem of finding the minimal and maximal elements, so the optimal results for this
problem on different models of PRAM are as follows,
Proposition 1[2]: The problem of computing the maximum/minimum of n elements in an array A can be performed in O(n/p)
time using p EREW PRAM processors, for p ≤ n/log n.
Proposition 2 [2]: The problem of computing the maximum/minimum of n elements in an array A can be performed in
• Ṍ(1) time using n CRCW PRAM processors.
• O(log log n) time using n/loglog n Common CREW PRAM processors.
Proposition 3 [2]: The maximum/minimum of n integers in the range [1, nO(1)] can be found in O(1) time using n CRCW PRAM
processors.
III.
OPTIMAL K-WAY PARALLEL MERGING ALGORITHM
In this section we present the main idea and the steps to design an optimal parallel algorithm to merge k sorted arrays. We
also analyze the algorithm based on two different models of PRAM: EREW and CRCW.
A. Main Idea
Since the elements of each array are sorted based on the field key, and the values of the keys are serial number. We can map the
values of the keys of the records into an integer range. So, each record can be representing as an integer number. We can do this
process by applying a mapping function that maps the n records into the domain [0,n-1]. After that we have n elements of
consecutive integer numbers. Therefore, we can apply the address strategy to fill the record of key correspond to value i in the
address i in the output array. The address or index strategy is used in many previous algorithms such as count sort and bit-index sort
[19]. Figure 1 represents the idea of the proposed algorithm. In the figure, we have three sorted arrays, R0, R1, and R2, of lengths 6,
4, and 5 respectively. Each element in the arrays consists of two fields. The first one is the key of the record, while the other is the
reminder data of the record.
B. Steps of k-way MergingParallel Algorithm
We give here the main steps to merge k sorted arrays of records such that the keys of the records are serial numbers and
consecutive values. The algorithm consists of three main steps as follows.
Step 1: Determine the minimum, min, value for all the keys in the k sorted arrays in parallel.
Step 2: Compute the address array ARi for each sorted array Ri in parallel as follows:
arij= M(rij • key)
where M is the mapping function and defined as :
M(rij • key)= rij • key – min
For all 0≤ i <k and 0 ≤ j< ni.
Step 3: For each element rij in the sorted array Ri, insert rij in the correct position in the output array R as follows.
r(arij)=rij
Remark 1: we can combine the two steps 2 and 3, in one step as follows.
r(M(rij • key))=rij
499
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
C. Complexity Analysis
In this section we analyze the proposed algorithm according to time, number of processors, cost, and optimality. The analysis
of the algorithm depends on the type of the model used.
In case of EREW and the number of processor p=n/log n, the running time for Steps 1, 2, and 3 are O(log n), O(1), O(log n)
respectively. The overall running time for the proposed algorithm is O(log n). Therefore, the cost of k-way merging is O(n) and the
algorithm is optimal.
In case of CRCW and the number of processor p=O(n), the running time for Steps 1, 2, and 3 are O(1), O(1), O(1) respectively.
The overall running time for the proposed algorithm is O(1). Therefore, the cost of k-way merging is O(n) and the algorithm is
optimal.
Remark 2: if the minimum number is known, we can merge the k arrays in constant time under EREW PRAM.
Stage 1:
2016322
2016326
2016312
2016318
0
1
d00
d01
d02
d03
d04
d05
d10
d11
3
0
1
2
3
4
2016324
2016321
5
2016323
2106316
4
2016319
2016315
3
2016317
2016313
2
2016314
1
2016325
0
R2
2016320
R1
R0
2
d12
d13
d20
d21
d22
d23
d24
0
1
2
3
4
5
0
1
2
3
0
1
2
3
4
1
3
4
9
10
14
0
6
8
13
2
5
7
11
12
Stage 2:
R1
2016319
2016323
2016324
d11
d12
6
7
8
9
d13
d20
d21
d22
d23
d24
11
12
13
14
2016322
2016323
2016324
2016325
2016326
10
2016321
5
2016317
2016320
d10
2016320
4
2016314
2016318
d05
2016319
3
4
2016312
2
3
2016318
1
2
2016317
0
1
2016316
d04
0
2016315
d03
3
2016314
d02
2
2016313
d01
1
2016312
d00
0
2016326
5
2016322
4
2016321
3
2106316
2
2016315
1
2016313
0
R2
2016325
R0
d10 d00 d20 d01 d02 d21
d11
d22
d12
d03
d04
d23
d24 d13 d05
Figure 1: Two main stages for the proposed algorithm.
500
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
OPTIMAL K-WAY PADDED PARALLEL MERGING ALGORITHM
IV.
In this section we study the same problem when the key of the records is not necessary consecutive. In this case, we use the
same idea in previous section, but we use extra gab spaces in the output. Adding more extra gab spaces in the output is called
padded technique. The concept of padded is used in different problems such as sorting [20][20]. In padded sort, we have n
elements that are taken from a uniform distribution and we want to ordered the n values in array of length n+ o(n) such that all
o(n) locations are filled with NULL. So, we can apply this concept to our problem to merge the k sorted arrays in a new sorted
array of length n+o(n).
The padded concept is based on using extra space present the main idea and the steps to design an optimal parallel algorithm to
merge k sorted arrays. We also analyze the algorithm based on two different models of PRAM: EREW and CRCW.
Stage 1:
R1
R0
2016334
4
2016332
3
2016324
2
2016318
1
2016317
0
2016333
3
2016328
2
2016326
1
2016312
0
2016336
5
2016322
4
2016321
3
2106316
2
2016315
1
2016313
0
R2
d00
d01
d02
d03
d04
d05
d10
d11
d12
d13
d20
d21
d22
d23
d24
0
1
2
3
4
5
0
1
2
3
0
1
2
3
4
1
3
4
9
10
24
0
14
16
21
5
6
12
20
22
Stage 2:
d03 d04
2016334
d21
d22
d23
d24
13
14
15
16
17
18
19
20
21
22
d22
d11
d12
d23 d13 d24
23
Figure 2: Two main stages for the proposed algorithm.
501
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
24
2016336
12
d20
2016334
11
d13
2016332
10
d12
2016328
9
4
2016326
8
3
2016332
2016323
d01 d02 d20 d21
7
2
2016324
6
1
2016333
2016312
5
2
2016324
2016336
d11
1
2016318
2016322
d10
0
2016317
2016321
d05
3
2016333
2106316
d04
4
0
2016322
2016315
d03
3
5
2016321
2016313
d02
4
2016318
2016313
d01
d10 d00 3
2016317
2016312
d00
2
2
2016316
1
1
2016315
0
0
R2
2016329
R1
R0
d05
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
A. Main Idea
We use an address array of size m, where m>n. Because the values of the keys not necessary consecutive. We can map the
values of the keys of the records into an integer range [0, m-1]. The value of m is equal to the difference between the largest and
smallest values of the keys. Figure 2 represents the idea of the proposed algorithm, where the minimum value of the keys is
2016312 and the maximum value of the keys is 2016336.
B. Steps of k-way Padded MergingParallel Algorithm
The algorithm consists of three main steps as follows.
Step 1: Determine the minimum, min, and the maximum, max, values for all the keys in the k sorted arrays in parallel.
Step 2: Compute the address array ARi for each sorted array Ri in parallel as follows:
arij= M(rij • key)
where ARi is array of length max – min.
Step 3: For each element rij in the sorted array Ri, insert rij in the correct position in the output array R as follows.
r(arij)=rij
The proposed algorithm has the same running time as in the previous section under both models of PRAM.
V.
CONCLUSION
In this paper we addressed the problem of merging k sorted arrays of records. Our study focused when the key of the record
represents as a serial numbers. We proposed two algorithms under EREW and CRCW PRAM. The first one when the keys are
consecutive, while the second when the keys are not necessary consecutive. The running time of the proposed algorithm is
constant under CRCW PRAM, while the running time is O(log n) under EREW PRAM.
ACKNOWLEDGMENT
This research was supported by Research Deanship, Hail University, KSA, on grant R2-2013-CS-4.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
S. Akl. Parallel sorting algorithms. Academic Press, Orlando, 1985.
S. Akl. Parallel computation: models and methods. Prentice Hall, Upper Saddle River, 1997
H Bahig. Parallel merging with restrictions. The Journal of Supercomputing, 43 (1): 99-104, 2008.
H Bahig. Integer merging on PRAM. Computing, 91(4), 365-378, 2011.
J. Bang-Jensen, J. Huang, and L. Ibarra. Recognizing and representing proper interval graphs in parallel using merging and sorting.
Discrete Applied Mathematics 155(4):442–456, 2007.
O. Berkman, and U. Vishkin. On parallel integer merging. Information and Computation 106:266–285, 1993.
Th. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to algorithms. MIT, Cambridge, 1990.
E. Dekel and I. Ozsvath, Parallel external sorting. Journal of Parallel and Distributed Computing, vol. 6, 623-635, 1989.
T. Hagerup, and M. Kutylowski. Fast integer merging on the EREW PRAM. Algorithmica, 17:55–66, 1997.
T. Hagerup, and C. Rub. Optimal merging and sorting on the EREW PRAM. Information Processing Letters, 33:181–185, 1989.
T. Hayashi, K. Nakano, and S. Olariu. Work-time optimal k-merge algorithms on the PRAM. IEEE Transaction on Parallel and
Distributed Systems, 9(3): 275-282, 1998.
R. Karp, and V. Ramachandran. Parallel algorithms for shared-memory machines. In: Van Leeuven J (ed) Handbook of theoretical
computer science, Vol A: Algorithms and complexity. Elsevier, Amsterdam, 869–941, 1990.
D. Knuth. The art of computer programming: sorting and searching. Addison–Wesley, Reading, 1973.
C. Kruskal. Searching, merging, and sorting in parallel computation. IEEE Transaction on Computers, 32(10):942–946, 1983.
T. Merrett. Relational information systems. Reston Publishing Co., Reston, 1984.
S. Olariu, C. Overstreet, and Z. Wen. Reconstructing binary trees in doubly logarithmic CREW time. Journal of Parallel and Distributed
Computing, Vol. 27, 100-105, 1995.
Z. Wen. Multi-way merging in parallel. IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 1, 11–17, Jan. 1996.
P. Valduriez, and G. Gardarin. Join and semijoin algorithms for multiprocessors database machines. ACM Transaction Database System
9:133–161, 1984.
L. F. Curi-Quintal, J. O. Cadenas, and G. M. Megson. Bit-index sort: A fast non-comparison integer sorting algorithm for permutations.
International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), 83 – 87, 2013.
502
https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 4, April 2016
[20] P. D. MacKenzie and Q. F. Stout.Ultra-Fast Expected Time Parallel Algorithms. Journal of Algorithms 26 (1998), . 1–33.
503
https://sites.google.com/site/ijcsis/
ISSN 1947-5500