Symmetric Allocations for Distributed Storage Derek Leong1

Symmetric Allocations for
Distributed Storage
Derek Leong1, Alexandros G. Dimakis2, Tracey Ho1
1California Institute of
Technology, USA
2University of Southern California, USA
GLOBECOM 2010
2010-12-09
A Motivating Example
Suppose you have a distributed storage system comprising
5 storage devices (“nodes”)…
1
2
Symmetric Allocations for Distributed Storage
3
4
5
2
A Motivating Example
Each node independently fails with probability 1/3, and
survives with probability 2/3 …
1
2
3
4
5
(1/3)2 (2/3)3 ≈ 0.0329218
Symmetric Allocations for Distributed Storage
3
A Motivating Example
Each node independently fails with probability 1/3, and
survives with probability 2/3 …
1
2
3
4
5
(1/3)5 ≈ 0.00411523
Symmetric Allocations for Distributed Storage
4
A Motivating Example
You are given a single data object of unit size,
and a total storage budget of 7/3 …
1
2
Symmetric Allocations for Distributed Storage
3
4
5
5
A Motivating Example
You can use any coding scheme to store any amount
of coded data in each node, as long as the total amount
of storage used is at most the given budget 7/3 …
1
2
Symmetric Allocations for Distributed Storage
3
4
5
6
A Motivating Example
01001010101
00101010001
01010101000
10101011101
01010010010
10001010100
1
01101010001
01010111010
10100100101
00010101001
10100101010
00101001110
10100101010
00101001110
3
4
2
Symmetric Allocations for Distributed Storage
5
7
A Motivating Example
(1/3)2 (2/3)3 ≈ 0.0329218
01001010101
00101010001
01010101000
10101011101
01010010010
10001010100
1
01101010001
01010111010
10100100101
00010101001
10100101010
00101001110
10100101010
00101001110
3
4
2
5
?
Symmetric Allocations for Distributed Storage
8
A Motivating Example
For maximum reliability, we need to find
(1) an optimal allocation of the given budget over the
nodes, and
(2) an optimal coding scheme
that jointly maximize the probability of successful recovery
Symmetric Allocations for Distributed Storage
9
A Motivating Example
Using an appropriate code, successful recovery occurs
whenever the data collector accesses at least a unit
amount of data (= size of the original data object)
1
2
Symmetric Allocations for Distributed Storage
3
4
5
10
A Motivating Example
1
2
Symmetric Allocations for Distributed Storage
3
4
5
11
A Motivating Example
1
2
3
4
5
Recovery
Probability
for p = 2/3
A
7/15 7/15 7/15 7/15 7/15
0.79012
B
7/6
7/6
0
0
0
0.88889
C
2/3
2/3
1/3
1/3
1/3
0.90535
Symmetric Allocations for Distributed Storage
12
Problem Formulation
#P-hard to
Given n nodes, access probability p, and total compute
storagefor a
allocation
budget T, find an optimal allocation (x1; …; xgiven
n) that
and choice of p
maximizes the probability of successful recovery
recovery probability
The optimal
allocation also
tells us whether
coding is
beneficial for
reliable storage
budget constraint
Trivial cases of minimum and maximum budgets:
 when T = 1, the allocation (1, 0, …, 0) is optimal
 when T = n, the allocation (1, 1, …, 1) is optimal
Symmetric Allocations for Distributed Storage
13
Related Work
 Discussion between R. Karp, R. Kleinberg, C. Papadimitriou,
E. Friedman, and others at UC Berkeley, 2005
 S. Jain, M. Demmer, R. Patra, K. Fall,
“Using redundancy to cope with failures in a delay tolerant
network,” SIGCOMM 2005
Symmetric Allocations for Distributed Storage
14
Symmetric Allocations
 We are particularly interested in symmetric allocations
because they are easy to describe and implement
 Successful recovery for the symmetric allocation
occurs if and only if at least
out of the
m nonempty nodes are accessed
 Therefore, the recovery probability of
is
Symmetric Allocations for Distributed Storage
15
Asymptotic Optimality of Max Spreading
The symmetric allocation that spreads the budget
maximally over all n nodes is asymptotically optimal when
the budget T is sufficiently large
RESULT 1
The gap between the recovery probabilities for an
optimal allocation and for the symmetric allocation
is at most
.
If p and T are fixed such that
approaches zero as
.
Symmetric Allocations for Distributed Storage
, then this gap
16
Asymptotic Optimality of Max Spreading
Proof Idea: Bounding the optimal recovery probability…
1. By conditioning on the number of accessed nodes r, we can express
the probability of successful recovery
as
where Sr is the number of successful r-subsets
2. We can in turn bound Sr by observing that we have Sr inequalities
of the form
, which can be summed up to produce
,
where
Symmetric Allocations for Distributed Storage
17
Asymptotic Optimality of Max Spreading
Proof Idea: Bounding the optimal recovery probability…
3. We therefore have
4. Applying the bound
to
leads to the conclusion that the optimal recovery probability is at most
Symmetric Allocations for Distributed Storage
18
Asymptotic Optimality of Max Spreading
Proof Idea: Bounding the suboptimality gap for max spreading…
1. The recovery probability of the allocation
is
2. The suboptimality gap for this allocation is therefore at most the
difference between the upper bound for the optimal recovery
probability and 1, which is
3. For
4. As
, we can apply the Chernoff bound to obtain
, this upper bound approaches zero
Symmetric Allocations for Distributed Storage
19
Optimal Symmetric Allocation
The problem is nontrivial even when restricted
to symmetric allocations…
Symmetric Allocations for Distributed Storage
number of
nonempty nodes
in the symmetric
allocation
20
Optimal Symmetric Allocation
Maximal spreading is optimal among symmetric allocations
when the budget T is sufficiently large
RESULT 2
If
, then either
or
is an optimal symmetric allocation.
Symmetric Allocations for Distributed Storage
21
Optimal Symmetric Allocation
Minimal spreading is optimal among symmetric allocations
when the budget T is sufficiently small
Coding is
unnecessary
for such an
allocation
RESULT 3
If
, then
is an optimal
symmetric allocation.
Symmetric Allocations for Distributed Storage
22
Optimal Symmetric Allocation
Proof Idea: Finding the optimal symmetric allocation…
1. Observe that we can find an optimal m* from among
2. For
candidates:
Recall that the recovery probability of the
, where allocation
, the recovery
probability
is
symmetric
is given
by
3. RESULT 2 (max spreading optimal) is a sufficient condition on p and T
for
to be nondecreasing in k
…
4. To obtain RESULT 3 (min spreading optimal)
, we first establish
a
m
sufficient condition on p and T for
to be
nonincreasing in k; we subsequently expand the condition to include
Forwhich
constant p and k,remains optimalis a
other points for
nondecreasing function of m
Symmetric Allocations for Distributed Storage
23
Optimal Symmetric Allocation
other symmetric
allocations may be
optimal in the gap
maximal
spreading
is optimal
among
symmetric
allocations
minimal
spreading
is optimal
among
symmetric
allocations
Symmetric Allocations for Distributed Storage
24
Conclusion
 The optimal allocation is not necessarily symmetric
 However, the symmetric allocation that spreads the budget
maximally over all n nodes is asymptotically optimal when
the budget is sufficiently large
 Furthermore, we are able to specify the optimal symmetric
allocation for a wide range of parameter values of p and T
Symmetric Allocations for Distributed Storage
25
Thank you!
Symmetric Allocations for Distributed Storage
26