Network Coding for Distributed Storage Systems

Network Coding for Distributed
Storage Systems
IEEE TRANSACTIONS ON INFORMATION THEORY,
SEPTEMBER 2010
Alexandros G. Dimakis
Brighten Godfrey
Yunnan Wu
Martin J. Wainwright
Kannan Ramchandran
1
Outline
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
Introduction
Background
Analysis
Evaluation
Conclusion
2
Introduction
‫ ﻪ‬Distributed storage systems provide reliable access to data
through redundancy spread over individually unreliable
nodes.
‫ ﻪ‬Storing data in distributed storage systems
‫ ﻩ‬the encoded data are spread across nodes.
‫ ﻩ‬require less redundancy than replication.
‫ ﻩ‬replace stored data periodically.
3
Introduction
‫ ﻪ‬Key issue in distributed storage systems.
‫ ﻩ‬repair bandwidth
‫ ﻩ‬storage space
‫ ﻪ‬How to generate encoded data in a distributed way as little
data as possible ?
4
MDS Codes
‫ ﻪ‬A common practice to repair from a single node failure for
an erasure coded system.
1.
2.
a new node to reconstruct the whole encoded data object.
then, generate just one encoded block.
‫ ﻪ‬Maximum Distance Separable (MDS) code.
‫( ﻩ‬n, k)-MDS property
‫ ﻩ‬recover original file by any k set of encoded data.
5
MDS Codes
M/k
M/k
MDS encode
M/k
File
divide
encode
store at n nodes
M/k
6
Introduction
‫ ﻪ‬Redundancy must be continually refreshed as nodes fail in
distributed storage systems.
‫ ﻩ‬large data transfers across the network.
7
Introduction
‫ ﻪ‬The erasure codes can be repaired without communicating
the whole data object.
‫( ﻪ‬4, 2)-MSR example when node is fail.
‫ ﻩ‬generate smaller parity packets of their data.
‫ ﻩ‬forward them to the newcomer.
‫ ﻩ‬the newcomer mix packets to generate two new packets.
0.5
0.5
0.5
0.5
0.5
0.5
0.5
8
Introduction
‫ ﻪ‬This paper identifies that there is a optimal tradeoff curve
between storage and repair bandwidth.
‫ ﻩ‬smaller storage space => less redundancy => more repair
bandwidth
‫ ﻪ‬This paper calls codes that lie on this optimal tradeoff
curve regenerating codes.
9
Introduction
‫ ﻪ‬Minimum-Storage Regenerating (MSR) codes.
‫ ﻩ‬can be efficiently repaired.
‫ ﻪ‬Minimum-Bandwidth Regenerating (MBR) codes.
‫ ﻩ‬storage node stores slightly more than M/k .
‫ ﻩ‬the repair bandwidth can be reduced.
10
Outline
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
Introduction
Background
Analysis
Evaluation
Conclusion
11
Erasure Codes
‫ ﻪ‬Classical coding theory focuses on the tradeoff between
redundancy and error tolerance.
‫ ﻪ‬In terms of the redundancy-reliability tradeoff, the
Maximum Distance Separable (MDS) codes are optimal.
‫ ﻩ‬the most well-known is Reed-Solomon codes.
12
Network Coding
‫ ﻪ‬Network coding allows
‫ ﻩ‬the intermediate nodes to generate output data by encoding
previously received input data.
‫ ﻩ‬information to be “mixed” at intermediate nodes.
‫ ﻪ‬This paper investigates the application of network coding
for the repair problem in distributed storage.
‫ ﻩ‬tradeoff between storage and repair network bandwidth
13
Distributed Storage Systems
‫ ﻪ‬Erasure codes could reduce bandwidth use by an order of
magnitude compared with replication.
‫ ﻪ‬Hybrid strategy:
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
one special storage node maintains one full replica.
multiple erasure encoded data.
transfer only M / k bytes for a new encoded data by replica node.
there is the problem when replica data lost.
14
Outline
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
Introduction
Background
Analysis
Evaluation
Conclusion
15
Information Flow Graph
16
Storage-Bandwidth Tradeoff
‫ ﻪ‬The normal redundancy we want to maintain requires
active storage nodes
‫ ﻩ‬each storing α bits
‫ ﻩ‬β bits each from any d surviving nodes
‫ ﻩ‬total repair bandwidth is γ = d β
‫ ﻪ‬For each set of parameters (n, k, d, α, γ), there is a family
of information flow graphs, each of which corresponds to a
particular evolution of node failures / repairs.
17
Storage-Bandwidth Tradeoff
‫ ﻪ‬Denote this family of directed acyclic graphs by
‫( ﻩ‬4, 2, 3, 1 Mb, 1.5 Mb) is feasible.
18
Storage-Bandwidth Tradeoff
‫ ﻪ‬Theorem 1 : For any α ≥ α*(n, k, d, γ), the points are
feasible.
19
Theorem Proof (1/4)
20
Theorem Proof (2/4)
‫ ﻪ‬.
‫ ﻪ‬.
‫ ﻪ‬.
‫ ﻪ‬.
21
Theorem Proof (3/4)
‫ ﻪ‬.
‫ ﻪ‬.
22
Theorem Proof (4/4)
‫ ﻪ‬.
‫ ﻪ‬.
23
Storage-Bandwidth Tradeoff
‫ ﻪ‬Code repair can be achieved if and only if the underlying
information flow graph has sufficiently large min-cuts.
24
Storage-Bandwidth Tradeoff
‫ ﻪ‬Optimal tradeoff curve between storage α and repair
bandwidth γ
‫( ﻩ‬γ = 1, α = 0.2)
(γ = 1, α = 0.1)
25
Special Cases (1/2)
‫ ﻪ‬Minimum-Storage Regenerating (MSR) Codes
‫ ﻩ‬.
‫ ﻩ‬.
26
Special Cases (2/2)
‫ ﻪ‬Minimum-Bandwidth Regenerating (MBR) Codes
‫ ﻩ‬.
‫ ﻩ‬.
27
Outline
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
Introduction
Background
Analysis
Evaluation
‫ ﻩ‬Node Dynamics and Objectives
‫ ﻩ‬Model
‫ ﻩ‬Quantitative Results
‫ ﻪ‬Conclusion
28
Node Dynamics and Objectives
(1/2)
‫ ﻪ‬A permanent failure
‫ ﻩ‬the permanent departure of a node from the system
‫ ﻩ‬a disk failure resulting in loss of the data stored on the node
‫ ﻪ‬A transient failure
‫ ﻩ‬node reboot
‫ ﻩ‬temporary network disconnection
29
Node Dynamics and Objectives
(2/2)
‫ ﻪ‬A file is available
‫ ﻩ‬it can be reconstructed from the data stored on currently available
nodes.
‫ ﻪ‬A file is durability
‫ ﻩ‬after permanent node failures, it may be available at some point in
the future.
30
Model (1/5)
‫ ﻪ‬The model has two key parameters, f and a.
‫ ﻩ‬a fraction f of the nodes storing file data fail permanently per unit
time.
‫ ﻩ‬at any given time, the node storing data is available with some
probability a.
‫ ﻪ‬The expected availability and maintenance bandwidth of
various redundancy schemes can be computed to maintain
a file of M bytes.
31
Model (2/5)
‫ ﻪ‬Replication
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
redundancy R replicas
store total R × M bytes
replace f × R × M bytes per unit time
the file is unavailable if no replica is available
‫ ﻯ‬probability (1 − α)𝑅
‫ ﻪ‬Ideal Erasure Codes
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
n = k × R, redundancy R = n / k
transfer just M / k bytes each packet
replace f × R × M bytes per unit time
unavailability probability
32
Model (3/5)
‫ ﻪ‬Hybrid
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
n = k × (R− 1)
store total R × M bytes
transfer f × R × M bytes per unit time
The file is unavailable if the replica is unavailable and fewer than k
erasure-coded packets are available
‫ ﻯ‬probability
33
Model (4/5)
‫ ﻪ‬Minimum-Storage Regenerating Codes
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
store total R × M bytes
redundancy R = n / k
replace f × R × M × δ𝑀𝑆𝑅 bytes per unit time
extra amount of information
unavailability
34
Model (5/5)
‫ ﻪ‬Minimum-Bandwidth Regenerating Codes
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
‫ﻩ‬
store total M × n × δ𝑀𝐵𝑅 bytes
redundancy R = n / k
replace f × M × n × δ𝑀𝐵𝑅 bytes per unit time
extra amount of information
unavailability
35
Estimating f and a
36
Quantitative Results (1/2)
37
Quantitative Results (2/2)
38
Quantitative Comparison
‫ ﻪ‬Comparison With Hybrid
‫ ﻩ‬Disadvantage : asymmetric design
‫ ﻪ‬MBR codes
‫ ﻩ‬Disadvantage :
‫ ﻯ‬reconstruct the entire file, requires communication with n−1 nodes
‫ ﻯ‬if the reading frequency of a file is sufficiently high and k is sufficiently small,
this inefficiency could become unacceptable.
39
Outline
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
‫ﻪ‬
Introduction
Background
Analysis
Evaluation
Conclusion
40
Conclusion
‫ ﻪ‬This paper presented a general theoretic framework that
can determine the information.
‫ ﻩ‬communicate to repair failures in encoded systems.
‫ ﻩ‬identify a tradeoff between storage and repair bandwidth.
‫ ﻪ‬One potential application area for the proposed
regenerating codes is distributed archival storage or backup.
‫ ﻩ‬regenerating codes potentially can offer desirable tradeoffs in
terms of redundancy, reliability, and repair bandwidth.
41