Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang Outline :: Introduction and motivation :: Hierarchical Codes :: Regenerating Codes :: Combining Hierarchical Codes and Regenerating Codes :: Conclusion Motivation: Elements of a P2P backup system Performance metrics: Storage efficiency: how much redundant information do you store? From Julian Monteiro 3 Motivation: Network Bandwidth is a scarce resource Our first objective is to find erasure codes that consume less communication bandwidth, i.e. have better efficiency factor ρ - Network communication bandwidth cannot be “put aside” for later use A second objective should be to adopt repair policies that provide a smooth utilization of the communication bandwidth 4 Hierarchical Codes Regenerating Codes ER-Hierarchical Codes Linear Codes: Overview - A particular way to build erasure codes is linear codes original fragments o1 c1,1 Linear combination o2 c1,2 c1,3 c1,4 o3 o4 P = CO O=C-1P p1 parity fragment pi ci , j o j j [c1,1 c1,2 c1,3 c1,4] [c2,1 c2,2 c2,3 c2,4] [c3,1 c3,2 c3,3 c3,4] [c4,1 c4,2 c4,3 c4,4] If C is invertible, i.e. the coefficient vectors are linearly independent, we can reconstruct the original fragments. If coefficients are chosen randomly in GF(216), the matrix is invertible with a very high probability. p1 p2 p3 p4 p5 p6 Hierarchical codes: Idea let us try to change the way the code is built: o1 p1 4+3 Traditional Erasure Code o2 p3 p4 o3 p5 p6 o4 Probability of failure p2 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% 1 2 3 4 5 6 # Unavailable Fragments o1 4+3 Hierarchical Code 7 p2 o2 p3 p4 o3 1 2 3 4 5 6 7 o4 There are sets of 4 parity fragments that are not sufficient to reconstruct the original file. 7 p5 p6 # Unavailable fragments p7 traditional erasure code p1 p7 Hierarchical code Hierarchical codes : Repair degree The repair degree determines the efficiency factor ρ 4+3 Hierarchical Code Probability of cost/failure 4+3 Traditional Erasure Code 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% Failure 50% 40% ρ =4 40% 30% ρ =2 30% 20% 10% 20% 10% 0% 0% 1 2 3 4 5 6 7 1 # Unavailable Fragments 2 3 4 5 # Unavailable fragments 8 6 7 Hierarchical codes: Recursive Construction HC-(k,h) •k original blocks •h redundant blocks 9 Hierarchical codes: Theory 10 Hierarchical codes: Repair What if p_1 and p_3 are lost? •Use p_2 , 1 out of {p_7, p_8} and 1 out {p_4, p_5, p_6} need 3 blocks What if p_1, p_2, and p_3 are lost? •Use ….. need ???? blocks In HC, the earlier we repair the repair is often “cheaper” 11 64+64 hierarchical codes: Reliability vs Cost 100% Cost=2 80% Cost=4 Cost=8 60% Cost=16 40% Cost=32 20% Cost=64 Failure 0% Probability of cost/failure Probability of cost/failure Two possible instances of a 64+64 hierarchical code 100% 80% 60% 40% 20% 0% 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 # Unavailable Fragments # Unavailable Fragments - Lower repair cost comes at the prices of reduced reliability 12 Hierarchical Codes Regenerating Codes ER-Hierarchical Codes Regenerating Codes: Idea What happens if… upon a repair we contact more than k peers? p1 p2 p5 p’4 d>k Every peer stores a parity block larger (or equal) than the usual parity fragment (i.e. 1/k of the file size)? o1 o2 o3 o4 p7 b1 |block|≥|file|/k p8 Regenerating codes (by G. Dimakis) give the answer: the repair communication requirements are much smaller. 14 Regenerating codes: Performance - regenerating codes are controlled by two additional parameters beyond k and h :: d the repair degree k ≤ d ≤ k+h-1 :: i the block expansion index 0 ≤ i ≤ k-1 - if we consider a regenerating code with k=32 and h=32: classical erasure codes Block size stretch 1.8 1 Additional space 1.6 i=31 1.4 i=22 i=15 1.2 i=7 1 i=0 0.8 repair-down reduction 2 Reduced communications bandwidth 0.1 0.01 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 d 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 MBR: Minimum-Bandwidth Regenerating MSR: Minimum-Storage Regenerating 15 d Regenerating codes: Performance - k=32 and h=32 and a stored file of 1MB: 1 Additional space repair-down reduction Block size stretch 2 1.8 1.6 i=31 1.4 i=22 i=15 1.2 i=7 1 i=0 0.8 Reduced communications bandwidth 0.1 0.01 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 d 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 Communication is impressively reduced with small amount of extra storage. d code d i repairdown storage Classical erasure code 32 0 1 MB 2 MB “ extreme“ regenerating code 63 30 42.47 KB 2.61 MB “reasonable” erasure code 40 7 84,62 KB 2.11 MB 16 Regenerating codes: A new dimension in the trade-off Communication RC(k,h,d,i,) •k original pieces •h additional pieces •d repair degree •i block expansion factor Storage Replication Regenerating codes can be seen as a generalization of replication and RSE that allow to more flexibly trade off communication and storage requirements. 17 Regenerating codes: Want to know more See http://csi.usc.edu/~dimakis/StorageWiki/doku.php A wiki on Coding for Distributed Storage maintained by Alexandros G. Dimakis 18 Hierarchical Codes Regenerating Codes ER-Hierarchical Codes ER-Hierarchical Codes • Can we combine Hierarchical codes and Regenerating Codes? • Yes: ER-Hierarchical Codes combine concepts of Hierarchical Codes and Regenerating Codes, namely that • most parity blocks are linear combinations of only a small subset of all original blocks and that • a storage block consists of α fragments, while a repair block has only β fragments, with , β < α 20 ER-Hierarchical Codes: Construction • How to transform Hierarchical code into ER-Hierarchical Code? 21 ER-Hierarchical Codes: Construction 22 ER-Hierarchical Codes: Repair • In HC we would need to download 4 blocks of size 1 each • 4 units of traffic • In ER-HC we now download 5 fragments of size ½ each • 2.5 units of traffic 23 ER-Hierarchical Codes: Traffic reduction (analysis) • ER-HC reduces the traffic by more than • 85% as compared to RSE and Regenerating Codes • 40% compared to Hierarchical codes Reg Code is MSR with d=k+1 24 ER-Hierarchical Codes: Repair Strategies 25 ER-Hierarchical Codes: Performance (simulation) In HC and ER-HC , the earlier we repair the “cheaper” the repair; is not the case for RG and RSE 26 Conclusion - Have presented some new codes that -greatly reduce the communications overhead -Regenerating codes apply principles of network coding to distributed storage and allow to trade off storage space for communications bandwidth -As compared to RSE codes -Regenerating codes increase the repair degree (number of nodes that must be contacted for repair) but significantly reduce the amount of data downloaded from each node -Hierarchical codes significantly reduce the repair degree while keeping the amount of data transferred by each node the same (as RSE) -Combining Regenerating Codes and Hierarchical Codes makes us win at both fronts -Reduces repair degree and the amount of data transmitted by each node 27 Future work • Further exploit the possibilities offered by ER-Hierarchical Codes •Study the relationship between coding and repair policies for systems with churn •Reactive repair results in repair burst •Proactive repair has smoother repair traffic but does unnecessary repairs. If repairs are cheap, as they are for ER-HC, proactive repair becomes much more attractive since the “earlier we repair”, the cheaper a repair 28 Thanks Questions?
© Copyright 2025 Paperzz