Symmetric Allocations for Distributed Storage Derek Leong1, Alexandros G. Dimakis2, Tracey Ho1 1California Institute of Technology, USA 2University of Southern California, USA GLOBECOM 2010 2010-12-09 A Motivating Example Suppose you have a distributed storage system comprising 5 storage devices (“nodes”)… 1 2 Symmetric Allocations for Distributed Storage 3 4 5 2 A Motivating Example Each node independently fails with probability 1/3, and survives with probability 2/3 … 1 2 3 4 5 (1/3)2 (2/3)3 ≈ 0.0329218 Symmetric Allocations for Distributed Storage 3 A Motivating Example Each node independently fails with probability 1/3, and survives with probability 2/3 … 1 2 3 4 5 (1/3)5 ≈ 0.00411523 Symmetric Allocations for Distributed Storage 4 A Motivating Example You are given a single data object of unit size, and a total storage budget of 7/3 … 1 2 Symmetric Allocations for Distributed Storage 3 4 5 5 A Motivating Example You can use any coding scheme to store any amount of coded data in each node, as long as the total amount of storage used is at most the given budget 7/3 … 1 2 Symmetric Allocations for Distributed Storage 3 4 5 6 A Motivating Example 01001010101 00101010001 01010101000 10101011101 01010010010 10001010100 1 01101010001 01010111010 10100100101 00010101001 10100101010 00101001110 10100101010 00101001110 3 4 2 Symmetric Allocations for Distributed Storage 5 7 A Motivating Example (1/3)2 (2/3)3 ≈ 0.0329218 01001010101 00101010001 01010101000 10101011101 01010010010 10001010100 1 01101010001 01010111010 10100100101 00010101001 10100101010 00101001110 10100101010 00101001110 3 4 2 5 ? Symmetric Allocations for Distributed Storage 8 A Motivating Example For maximum reliability, we need to find (1) an optimal allocation of the given budget over the nodes, and (2) an optimal coding scheme that jointly maximize the probability of successful recovery Symmetric Allocations for Distributed Storage 9 A Motivating Example Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object) 1 2 Symmetric Allocations for Distributed Storage 3 4 5 10 A Motivating Example 1 2 Symmetric Allocations for Distributed Storage 3 4 5 11 A Motivating Example 1 2 3 4 5 Recovery Probability for p = 2/3 A 7/15 7/15 7/15 7/15 7/15 0.79012 B 7/6 7/6 0 0 0 0.88889 C 2/3 2/3 1/3 1/3 1/3 0.90535 Symmetric Allocations for Distributed Storage 12 Problem Formulation #P-hard to Given n nodes, access probability p, and total compute storagefor a allocation budget T, find an optimal allocation (x1; …; xgiven n) that and choice of p maximizes the probability of successful recovery recovery probability The optimal allocation also tells us whether coding is beneficial for reliable storage budget constraint Trivial cases of minimum and maximum budgets: when T = 1, the allocation (1, 0, …, 0) is optimal when T = n, the allocation (1, 1, …, 1) is optimal Symmetric Allocations for Distributed Storage 13 Related Work Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005 S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005 Symmetric Allocations for Distributed Storage 14 Symmetric Allocations We are particularly interested in symmetric allocations because they are easy to describe and implement Successful recovery for the symmetric allocation occurs if and only if at least out of the m nonempty nodes are accessed Therefore, the recovery probability of is Symmetric Allocations for Distributed Storage 15 Asymptotic Optimality of Max Spreading The symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget T is sufficiently large RESULT 1 The gap between the recovery probabilities for an optimal allocation and for the symmetric allocation is at most . If p and T are fixed such that approaches zero as . Symmetric Allocations for Distributed Storage , then this gap 16 Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… 1. By conditioning on the number of accessed nodes r, we can express the probability of successful recovery as where Sr is the number of successful r-subsets 2. We can in turn bound Sr by observing that we have Sr inequalities of the form , which can be summed up to produce , where Symmetric Allocations for Distributed Storage 17 Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… 3. We therefore have 4. Applying the bound to leads to the conclusion that the optimal recovery probability is at most Symmetric Allocations for Distributed Storage 18 Asymptotic Optimality of Max Spreading Proof Idea: Bounding the suboptimality gap for max spreading… 1. The recovery probability of the allocation is 2. The suboptimality gap for this allocation is therefore at most the difference between the upper bound for the optimal recovery probability and 1, which is 3. For 4. As , we can apply the Chernoff bound to obtain , this upper bound approaches zero Symmetric Allocations for Distributed Storage 19 Optimal Symmetric Allocation The problem is nontrivial even when restricted to symmetric allocations… Symmetric Allocations for Distributed Storage number of nonempty nodes in the symmetric allocation 20 Optimal Symmetric Allocation Maximal spreading is optimal among symmetric allocations when the budget T is sufficiently large RESULT 2 If , then either or is an optimal symmetric allocation. Symmetric Allocations for Distributed Storage 21 Optimal Symmetric Allocation Minimal spreading is optimal among symmetric allocations when the budget T is sufficiently small Coding is unnecessary for such an allocation RESULT 3 If , then is an optimal symmetric allocation. Symmetric Allocations for Distributed Storage 22 Optimal Symmetric Allocation Proof Idea: Finding the optimal symmetric allocation… 1. Observe that we can find an optimal m* from among 2. For candidates: Recall that the recovery probability of the , where allocation , the recovery probability is symmetric is given by 3. RESULT 2 (max spreading optimal) is a sufficient condition on p and T for to be nondecreasing in k … 4. To obtain RESULT 3 (min spreading optimal) , we first establish a m sufficient condition on p and T for to be nonincreasing in k; we subsequently expand the condition to include Forwhich constant p and k,remains optimalis a other points for nondecreasing function of m Symmetric Allocations for Distributed Storage 23 Optimal Symmetric Allocation other symmetric allocations may be optimal in the gap maximal spreading is optimal among symmetric allocations minimal spreading is optimal among symmetric allocations Symmetric Allocations for Distributed Storage 24 Conclusion The optimal allocation is not necessarily symmetric However, the symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget is sufficiently large Furthermore, we are able to specify the optimal symmetric allocation for a wide range of parameter values of p and T Symmetric Allocations for Distributed Storage 25 Thank you! Symmetric Allocations for Distributed Storage 26
© Copyright 2026 Paperzz