Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore APSYS 2012, Seoul http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage © 2012 A. Datta & F. Oggier, NTU Singapore What is this work about? The story so far … C’est la vie Scale-out Huge volume of data Over time Distributed Storage Systems Failures are inevitable! Overheads Repairing lost redundancy Erasure coding © 2012 A. Datta & F. Oggier, NTU Singapore Fault-tolerance What is this work about? The story so far … Repair fan-in k’’ Data tx. per node Overall data tx. Storage per node … B2 (n,k) code Retrieve some k” blocks (k”=2…n-1) to recreate a lost block Bx Bx Re-insert … – – – – – B1 … • Design space Lost block Bn n encoded blocks Reinsert in (new) storage devices, so that there is (again) n encoded blocks © 2012 A. Datta & F. Oggier, NTU Singapore Related works A non-exhaustive list Most of these codes look at design of new codes with inherent repairability properties. Codes on codes e.g. Hierarchical & Pyramid codes Network coding e.g. Regenerating codes This work: An engineering approach – can we achieve good repairability using existing (mature) techniques? (Our solution is similar to “codes on codes”) Locally repairable codes e.g. Self-repairing codes Array codes … An Overview of Codes Tailor-made for Networked Distributed Data Storage Anwitaman Datta, Frederique Oggier arXiv:1109.2317 © 2012 A. Datta & F. Oggier, NTU Singapore Separation of concerns • Two distinct design objectives for distributed storage systems – Fault-tolerance – Repairability • Related works: Codes with inherent repairability properties – Achieve both objectives together • There is nothing fundamentally wrong with that – E.g., We continue to work on self-repairing codes • This work: An extremely simple idea – Introduce two different kinds of redundancy • Any (standard) erasure code – for fault-tolerance • RAID-4 like parity (across encoded pieces of different objects) – for repairability © 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding (RGC) e11 e12 e1k e1k+1 e1n e21 e22 e2k e2k+1 e2n … … … em1 em2 emk emk+1 emn p1 p1 pk pk+1 pn … © 2012 A. Datta & F. Oggier, NTU Singapore … … … RAID-4 of erasure coded pieces of different objects Erasure coding of individual objects RGC repairability • Choosing a suitable m < k – Reduction in data transfer for repair – Repair fan-in disentangled from base code parameter “k” • Large “k” may be desirable for faster (parallel) data access • Codes typically have trade-offs between repair fan-in, code parameter “k” and code’s storage overhead (n/k) • However: The gains from reduced fan-in is probabilistic – For i.i.d. failures with probability “f” • Possible to reduce repair time – By pipelining data through the live nodes, and computing partial parity © 2012 A. Datta & F. Oggier, NTU Singapore RGC repairability (and storage overhead ρ) © 2012 A. Datta & F. Oggier, NTU Singapore Parameter “m” choice • Smaller m: lower repair cost, larger storage overhead • Is there an optimal choice of m? If so, how to determine it? – A rule of thumb: rationalized by r simultaneous (multiple) repairs – E.g. for (n=15, k=10) code: m < 5 • m = 3 or 4 implies – Repair bandwidth saving of 40-50% even for f = 0.1 • Typically, in stable environments, f will be much smaller, and the relative repair gains much more – Relatively low storage overhead of 2x or 1.875x © 2012 A. Datta & F. Oggier, NTU Singapore Storage overhead & static resilience © 2012 A. Datta & F. Oggier, NTU Singapore Further discussions • Possibility to localize repair traffic – Within a storage rack, by placing a whole parity group in same rack – Without introducing any correlated failures of pieces of the same object • Many unexplored issues – Soft errors (flipped bits) – Object update, deletion, … – Non i.i.d./correlated failures © 2012 A. Datta & F. Oggier, NTU Singapore Concluding remarks • RAID-4 parity of erasure encoded pieces of multiple objects – Lowers the cost of data transfer for a repair – Reduces repair fan-in – Possibility to localize repairs (and save precious interconnect BW) • w/o introducing correlated failures w.r.to a single object – Pipelining the repair traffic helps realize very fast repairs • Since the repairing node’s I/O, bandwidth or compute does not become a bottleneck • Also the computations for repair are cheaper than decoding/encoding – Retains comparable storage overhead for comparable static resilience if only erasure coding was used (surprisingly so!) • At least for quite some specific code parameter choices we tried • Opens up many interesting questions that can be investigated experimentally as well as theoretically http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage © 2012 A. Datta & F. Oggier, NTU Singapore
© Copyright 2025 Paperzz