Business Communication

Redundantly Grouped Cross-object Coding
for Repairable Storage
Anwitaman Datta & Frédérique Oggier
NTU Singapore
APSYS 2012, Seoul
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage
© 2012 A. Datta & F. Oggier, NTU Singapore
What is this work
about?
The story so far …
C’est la vie
Scale-out
Huge volume of data
Over time
Distributed Storage Systems
Failures are inevitable!
Overheads
Repairing lost redundancy
Erasure coding
© 2012 A. Datta & F. Oggier, NTU Singapore
Fault-tolerance
What is this work
about?
The story so far …
Repair fan-in k’’
Data tx. per node
Overall data tx.
Storage per node
…
B2
(n,k) code
Retrieve some
k” blocks (k”=2…n-1)
to recreate a lost block
Bx
Bx
Re-insert
…
–
–
–
–
–
B1
…
• Design space
Lost block
Bn
n encoded blocks
Reinsert in (new)
storage devices, so
that there is (again)
n encoded blocks
© 2012 A. Datta & F. Oggier, NTU Singapore
Related
works
A non-exhaustive list
Most of these codes look at
design of new codes with
inherent repairability properties.
Codes on codes
e.g. Hierarchical &
Pyramid codes
Network coding
e.g. Regenerating codes
This work: An engineering
approach – can we achieve good
repairability using existing
(mature) techniques?
(Our solution is similar to
“codes on codes”)
Locally repairable codes
e.g. Self-repairing codes
Array codes
…
An Overview of Codes Tailor-made for Networked Distributed Data Storage
Anwitaman Datta, Frederique Oggier
arXiv:1109.2317
© 2012 A. Datta & F. Oggier, NTU Singapore
Separation of concerns
• Two distinct design objectives for distributed storage systems
– Fault-tolerance
– Repairability
• Related works: Codes with inherent repairability properties
– Achieve both objectives together
• There is nothing fundamentally wrong with that
– E.g., We continue to work on self-repairing codes
• This work: An extremely simple idea
– Introduce two different kinds of redundancy
• Any (standard) erasure code
– for fault-tolerance
• RAID-4 like parity (across encoded pieces of different objects)
– for repairability
© 2012 A. Datta & F. Oggier, NTU Singapore
Redundantly Grouped Cross-object Coding (RGC)
e11
e12
e1k
e1k+1
e1n
e21
e22
e2k
e2k+1
e2n
…
…
…
em1
em2
emk
emk+1
emn
p1
p1
pk
pk+1
pn
…
© 2012 A. Datta & F. Oggier, NTU Singapore
…
…
…
RAID-4 of erasure coded pieces of different objects
Erasure coding of individual objects
RGC repairability
• Choosing a suitable m < k
– Reduction in data transfer for repair
– Repair fan-in disentangled from base code parameter “k”
• Large “k” may be desirable for faster (parallel) data access
• Codes typically have trade-offs between repair fan-in, code parameter
“k” and code’s storage overhead (n/k)
•
However: The gains from reduced fan-in is probabilistic
– For i.i.d. failures with probability “f”
• Possible to reduce repair time
– By pipelining data through the live nodes, and computing partial
parity
© 2012 A. Datta & F. Oggier, NTU Singapore
RGC repairability (and storage overhead ρ)
© 2012 A. Datta & F. Oggier, NTU Singapore
Parameter “m” choice
• Smaller m: lower repair cost, larger storage overhead
• Is there an optimal choice of m? If so, how to determine it?
– A rule of thumb: rationalized by r simultaneous (multiple) repairs
– E.g. for (n=15, k=10) code: m < 5
• m = 3 or 4 implies
– Repair bandwidth saving of 40-50% even for f = 0.1
• Typically, in stable environments, f will be much smaller, and the
relative repair gains much more
– Relatively low storage overhead of 2x or 1.875x
© 2012 A. Datta & F. Oggier, NTU Singapore
Storage overhead & static resilience
© 2012 A. Datta & F. Oggier, NTU Singapore
Further discussions
• Possibility to localize repair traffic
– Within a storage rack, by placing a whole parity group in same rack
– Without introducing any correlated failures of pieces of the same object
• Many unexplored issues
– Soft errors (flipped bits)
– Object update, deletion, …
– Non i.i.d./correlated failures
© 2012 A. Datta & F. Oggier, NTU Singapore
Concluding remarks
• RAID-4 parity of erasure encoded pieces of multiple objects
– Lowers the cost of data transfer for a repair
– Reduces repair fan-in
– Possibility to localize repairs (and save precious interconnect BW)
• w/o introducing correlated failures w.r.to a single object
– Pipelining the repair traffic helps realize very fast repairs
• Since the repairing node’s I/O, bandwidth or compute does not become a
bottleneck
• Also the computations for repair are cheaper than decoding/encoding
– Retains comparable storage overhead for comparable static resilience if
only erasure coding was used (surprisingly so!)
• At least for quite some specific code parameter choices we tried
• Opens up many interesting questions that can be investigated
experimentally as well as theoretically
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage
© 2012 A. Datta & F. Oggier, NTU Singapore