View the slides

Evalua&on of Codes with Inherent Double Replica&on for Hadoop Indian Institute
of Science
M. Nikhil Krishnan, N. Prakash, V. Lalitha, Birenjith Sasidharan, P. Vijay Kumar Srinivasan Narayanamurthy, Ranjit Kumar, Siddhartha Nandi Double Versus Triple Replica&on •  Triple replica&on of data is common in Hadoop: 1 1 1 1 o  Data availability for MapReduce (MR) opera&ons o  Resiliency to node failure o  Storage overhead (OH) •  Double replica&on in comparison: 1 1 1 o  Data availability for MR opera&ons (moderate workloads) o  Resiliency to node failure o  Via addi&onal pari&es (checksums) o  Eg: RAID + mirroring (RAID + m) o  Reduced OH RAID + Mirroring (RAID+m) 1 1 1 2 2 2 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 Data P P 3 (10,9) RAID + m •  A simple example of double replica&on with added parity •  P is an XOR of the 9 data blocks •  The two parity blocks ensure adequate resiliency The Heptagon-­‐Local Code (HLC) : An Alternate Code with Inherent Double Replica&on HLC Global Parity Heptagon 1 Heptagon 2 Looks complicated, but is not as we shall see.. G. M. Kamath, N. Prakash, V. Lalitha, and P. Vijay Kumar, ``Codes with Local Regenera&on and Erasure Correc&on'‘, to appear in IEEE Trans. Inform. Theory, 2014. Performance Comparison: Overhead (OH) and Resiliency (MTTDL) Code OH MTTDL (in years) 2-­‐REP 2 1.5E+05 3-­‐REP 3 1.2E+09 RAID + m 2.22 2.0E+09 HLC 2.15 8.3E+09 The HLC has reduced overhead for the desired resiliency but there is an issue rela&ng to Data Locality…. Map Task Assignment: Local vs Remote Tasks DATA 1 DATA 2 Free Processor Busy Processor Free map slot Busy map slot DATA n Node 1 Node 2 Node 3 1 2 Local task Remote task Node 4 Importance of Data Locality for Map Tasks •  Data Locality measured by percentage of tasks that are executed locally Decrease in job &me Increase in Data Locality Decrease in network traffic J. Dean and S. Ghemawat, ``Mapreduce: simplified data processing on large clusters," Communica4ons of the ACM, 2008. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, ``Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling," in ACM Eurosys, 2010. The Issue with Data Locality •  The HLC does not fare well in comparison with RAID+m in terms of locality for Map Task Assignment •  We will shortly see the reason for this •  One work around is to have a larger number of processors per node as we do here. An Alternate Idea of Dealing with the Data Locality Issue •  Modify the HDFS to permit coding across files – this would eliminate this issue altogether (but not part of the present work) •  Other op&ons ? Evolu&on of the Heptagon-­‐Local Code Examples of a Regenera&ng code Pentagon Code Heptagon Code Locally Repairable Code Heptagon Local Code Regenera&ng Code Regenera&ng Codes and Locally Repairable Codes are two classes of codes designed with distributed storage in mind The Pentagon Code 1,2, 3,4 1,5 6,7 4,7, 9,P 3,6, 8,P 2,5, 8,9 N1 N2 N3 N4 N5 •  An example of a code with inherent double replica&on •  9 data blocks encoded into 20 blocks and stored in 5 nodes •  Each node stores 4 blocks Building the Pentagon Code 1 2 3 4 5 6 7 8 9 •  Input: 9 data blocks N. B. Shah, K. V. Rashmi, P. Vijay Kumar and K. Ramchandran, ``Distributed Storage Codes with Repair-­‐
by-­‐Transfer and Non-­‐achievability of Interior Points on the Storage-­‐Bandwidth Tradeoff, '' IEEE Trans. Inform. Theory, Mar. 2012. Step 1: Adding a Single Checksum 1 2 3 4 5 6 7 8 9 P •  Input: 9 data blocks •  An addi&onal XOR parity is added Step 2: Label the Edges of the Graph with these 10 Blocks 1 3 2 4 5 6 7 8 9 P •  Input: 9 data blocks N1 1 4 2 7 N5 N2 6 5 3 9 P •  An addi&onal XOR parity is added •  The resultant 10 symbols are placed on the edges of a fully-­‐
connected pentagon 8 N4 N3 Final Step: Using Edge Labels to Populate the Nodes 1 3 2 4 5 6 7 8 9 P •  Input: 9 data blocks N1 1 N5 1,5 6,7 1,2, 3,4 4 2 7 6 5 3 9 2,5, 8,9 N4 8 3,6, 8,P N3 4,7, N2 9,P P •  An addi&onal XOR parity is added •  The resultant 10 symbols are placed on the edges of a fully-­‐
connected pentagon •  Each node stores the data appearing on its incoming edges and we are done! An Alternate View: the Pentagon Code as a Rearrangement of RAID+m N1 1,2, 3,4 1 2 3 4 5 6 7 8 9 P 1 2 3 4 5 6 7 8 9 P RAID + m Code 1,5 4,7, 9,P N2 N5 6,7 rearrange The Pentagon Code can be viewed as a compact rearrangement of the block of the (10,9) RAID+m code 2,5, 8,9 3,6, 8,P N4 N3 Resiliency of the Pentagon Code 1 3 2 4 5 6 7 8 9 N1 P •  Can tolerate 2 (out of 5) node failures 1 4 2 7 N5 6 5 3 9 2,5, 8,9 N4 8 4,7, N2 9,P P 3,6, 8,P N3 Resiliency of the Pentagon Code 1 3 2 4 5 6 7 8 9 N1 1 N5 5 6,7 2, 3,4 4 2 7 6 5 3 9 2,5, 8,9 N4 8 4,7, N2 9,P P •  Can tolerate 2 (out of 5) node failures •  All blocks except with the excep&on of one block 1 can be copied from the other 3 nodes P 3,6, 8,P N3 Resiliency of the Pentagon Code 1 3 2 4 5 6 7 8 9 N1 1 N5 1,5 6,7 1,2, 3,4 4 2 7 6 5 3 9 2,5, 8,9 N4 8 3,6, 8,P N3 4,7, N2 9,P P P •  Can tolerate 2 (out of 5) node failures •  All blocks except with the excep&on of one block 1 can be copied from the other 3 nodes •  Block shared between the two failed nodes, recovered via parity P Extending the Pentagon to a Heptagon •  Construc&on extends to a polygon of any size 1 1,5 6,7 1,2,9, 17,18,P 1,2, 3,4 2 7 6 5 3 9 2,5, 8,9 8 1,7,8, 14,15,16 4 2,3,10, 16,19,P 4,7, 9,P P 3,6, 8,P Pentagon : (10, 9) RAID + m 3,4,11, 13,15,17 6,7,12 13,18,20 5,6,8, 9,10,11 4,5,12, 14,19,20 Heptagon : (21, 20) RAID + m Extending the Pentagon to a Heptagon •  Construc&on extends to a polygon of any size 1 1,5 6,7 1,2,9, 17,18,P 1,2, 3,4 4 2 7 6 5 3 9 2,5, 8,9 8 4,7, 9,P OH decreases 42 20
<
20 9
Pentagon : (10, 9) RAID + m 2,3,10, 16,19,P 3,4,11, 13,15,17 6,7,12 13,18,20 P 3,6, 8,P 1,7,8, 14,15,16 But resiliency Is reduced 2 2
<
7 5
5,6,8, 9,10,11 4,5,12, 14,19,20 Heptagon : (21, 20) RAID + m Finally: The Heptagon-­‐Local Code (HLC) Global Pari&es Heptagon 1 Heptagon 2 Global pari&es offer a simple way to increase the resiliency of the Heptagon Code, at a slight increase in OH G. M. Kamath, N. Prakash, V. Lalitha, and P. Vijay Kumar, ``Codes with Local Regenera&on and Erasure Correc&on'‘, accepted for publica&on in IEEE Trans. Inform. Theory, 2014. Heptagon-­‐Local Code 1 node with 2 global parity blocks Heptagon 1 Heptagon 2 Global parity is computed on all 40 data blocks •  Overall : 40 data blocks 86 encoded blocks •  Stored in 15 nodes •  Can tolerate any 3 (out of 15) node failures Heptagon-­‐Local Code: Recovering from Two Node Erasures 2 node failures within a Heptagon recovered locally (as in the Pentagon) Heptagon-­‐Local Code: Recovering from Three Node Erasures 3 node failures within a Heptagon recovered using the two pari&es and with the help of the second Heptagon Overhead vs Resiliency Code Overhead 2-­‐REP 3-­‐REP Pentagon Heptagon (10, 9) RAID + m (12,11) RAID + m Heptagon-­‐Local 2 3 2.22 2.1 2.22 2.18 2.15 MTTDL (in years) 1.46E+05 1.20E+09 1.05E+08 2.68E+07 2.03E+09 6.50E+08 8.34E+09 •  MTTDL calculated for a 25 node system using standard models •  The heptagon-­‐local code is a berer choice than RAID + m Q. Xin, E. L. Miller, T. Schwarz, D. D. Long, S. A. Brandt, and W. Litwin, ``Reliability mechanisms for very large storage systems," in IEEE Massive Storage Systems and Technology (MSST), 2003. Data Locality for the Heptagon Code 1,2,9, 17,18,P 1,7,8, 14,15,16 2,3,10, 16,19,P 3,4,11, 13,15,17 6,7,12 13,18,20 5,6,8, 9,10,11 4,5,12, 14,19,20 •  Data Locality affected by the concentra&on of data blocks in the Heptagon Code •  Say 2 map slots / node and MR job on 1 Heptagon-­‐Coded file –  No. of local tasks ≤ (7 X 2) = 14 –  No. of remote tasks ≥ (20 – 14) = 6 •  Data Locality : RAID + m > Heptagon-­‐Local Experimental Set-­‐up •  Implemented the Pentagon and Heptagon-­‐Local Codes on top of HDFS-­‐RAID •  HDFS-­‐RAID : Facebook's open-­‐source implementa&on of Reed-­‐Solomon codes over HDFS SET-­‐UP A SET-­‐UP B Processors (Map Slots) Per Node 2 4 Nodes 25 10 Plaxorms Dual-­‐Core Laptops Server-­‐Class Machines Codes Tested Pentagon, Heptagon-­‐Local* Pentagon * The Heptagon code was used in the experiments, but both Heptagon and Heptagon-­‐Local have the same Map-­‐Reduce performance MR Performance (Batch Jobs) Set-­‐up 1 : 2 map slots / node Set-­‐up 2 : 4 map slots / node •  As expected, substan&al loss in performance with 2 map slots •  With 4 map slots, pentagon code performs well, even at load of 75 % Summary •  We discussed a coding scheme – the Heptagon-­‐Local Code, having inherent double replica&on for data blocks •  The Heptagon-­‐Local Code enjoys good overhead and resiliency, when compared with other schemes such as RAID +m •  For good data locality, a larger number of processor cores / node is needed Future Work : Restoring Data Locality •  Modify the HDFS to permit erasure coding across files which would eliminate the issue of data concentra&on Heptagon 1 Heptagon 2 Related Work : Erasure Codes in Distributed Storage Implementa&on of locally repairable codes, focus on overhead vs repair-­‐cost (Huang et al, Sathiamoorthy et al) Regenera&ng codes for cloud based systems, focus on decreasing repair bandwidth (Hu et al, Runhui Li et al) A new erasure coding scheme having berer single-­‐node repair cost than a Reed-­‐
Solomon code (Rashmi et al) A new class of codes (rotated Reed-­‐Solomon codes) for improved degraded read performance (Khan et al) These codes possess a single copy of data block and in the Hadoop context, are perhaps most suited for cold-­‐data storage. Thanks !