Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007 1 Outline Introduction Preliminaries Persistent Data Access Two-way random walks EDFC and ADFC Discussion of Multiple Encoded Blocks Performance Evaluation Conclusion 2 Introduction (1/5) It has been a conventional assumption that measured data in individual sensors are gathered and processed at powered sinks. Internet Connections via Data Aggregation This assumption may not realistically hold. large-scale sensor networks inaccessible geographical regions 3 Introduction (2/5) Our proposed vision is: Ask the sensors to collaboratively store measured data over a historical period of time. After a later time of convenience, a collector collects such measured data directly from the sensors. PUSH Model Sensors send data periodically. PULL Model Sensors are passively polled by the collector. 4 Introduction (3/5) We propose a novel decentralized implementation of fountain codes in sensor networks. Data can be encoded in a distributed fashion. A sensor disseminates its data to a random subset of sensors in the network. Each sensor only encodes data it has received. The collector is able to decode original data by collecting a sufficient number of encoded data blocks. 5 Introduction (4/5) Our decentralized implementation of fountain codes does not require the support of a generic layer of routing protocols. Do not need Routing Table or Geographical Routing Protocols. Use random walks to disseminate data. 6 Introduction (5/5) Caching Caching Source Blocks Encoded Blocks failure! Caching Collector Sensed Data Decoding! Caching : sensing nodes : caching nodes 7 Preliminaries Why Fountain Codes? Replication Error-correcting Codes Implemented in a centralized fashion Random Linear Codes backup sensors But a large number of replicas are required. decentralized But the decoding process is computationally expensive. Fountain Codes O( K 3 ) Low decoding complexity: O ( K ln K ) superior decoding performance “Digital Fountain Codes V.S. Reed-Solomon Code For Streaming Applications” 8 S. K. Chang Preliminaries LT Codes In LT codes, K source blocks can be decoded from 2 any subset of K O( K ln ( K/δ )) encoded blocks. with probability 1 - degree the number of source blocks used to generate an encoded block The degree distribution of encoded blocks in LT codes follows the Robust Soliton distribution. 9 Preliminaries LT Codes Ideal Soliton distribution ρ() 1/K ρ(i ) 1/i(i-1 ) if i 1 for i 2, 3, ..., K Let R c ln ( K/δ ) K R/iK τ (i) R ln ( R/δ ) / K 0 for i 1, ..., K/R-1 for i K/R for i K/R 1, ..., K Robust Soliton distribution ρ(i) τ (i) μ(i) i ρ(i) τ (i) 10 Preliminaries LT Codes Example of Robust Soliton distribution spike! K=10000, c=0.2, and K/R = 41 δ =0.05 The encoded blocks with a degree higher than K/R are not essential in decoding! 11 Preliminaries Random Walks on Graphs We describe random walks in the context of disseminating a source block. sensor: node in the graph The next hop is randomly chosen from the neighbors of the source node. A random walk corresponds to a time-reversible Markov chain. In this paper, we choose a variant of the Metropolis algorithm. a generalization of the natural random walks for the Markov chain non-uniform steady-state distribution 12 Preliminaries Metropolis Algorithm The Metropolis algorithm computes the transition matrix. P Pij Steady-state distribution π (π1 , π 2 , ...) N (i ) : neighbors of node i M : maximal node degree in the graph min( 1, π j /πi )/M Pij 0 1 j i Pij if i j and j N (i ) if i j and j N (i ) if i j 13 Persistent Data Access Decentralized Fountain Codes Caching degree d Source Blocks Caching request source blocks source Encoded Blocks blocks request Caching Sensed Data Sensed Data Caching : sensing nodes K based on two-way random walks : caching nodes N 14 Persistent Data Access Decentralized Fountain Codes We seek to construct decentralized fountain codes with only one traversal of random walks. from sensing nodes to the caching nodes Cache Nodes: Encode and store the source blocks. Collector: Decode the source blocks. We propose two heuristic algorithms. EDFC and ADFC guarantee the Robust Soliton distribution of LT codes 15 Persistent Data Access Exact Decentralized Fountain Codes The randomization introduced by random walks. Distinct source blocks received by a node is uncertain. We must disseminate more than d source blocks on each node. Redundancy Coefficient: x d Assume each node receives xd d blocks. x d , Pr (receive less than d nodes) 16 Persistent Data Access Exact Decentralized Fountain Codes The number of random walks: N d 1 x d dμ (d ) K b K Probabilistic forwarding tables: bK π d xd d πd xd d πd bK xd d N i 1 xi iμ(i ) K 17 Persistent Data Access Exact Decentralized Fountain Codes Source Blocks degree d Caching degree d Encoded Blocks degree d source blocks source blocks degree d degree d Sensed Data degree d Collector Decoding! Sensed Data Caching Caching : sensing nodes K π d , forwarding Table, and # of random walks. : caching nodes N 18 Persistent Data Access Exact Decentralized Fountain Codes The steps of EDFC are: Step 1. Degree generation. from the Robust Soliton distribution Step 2. Compute steady-state distribution. π d Step 3. Compute probabilistic forwarding table. by the Metropolis algorithm Step 4. Compute the number of random walks. b: number of random walks Step 5. Block dissemination. based on the probabilistic forwarding table Step 6. Encoding. by bitwise XOR of a subset of d source blocks 19 The source node IDs are attached in the encoded block! Persistent Data Access Exact Decentralized Fountain Codes g NE/K Overhead ratio x d db 1- Pr(Yi 1| X d ) 1-(1-π d )bb K 1 0 Violation x dμ(d ) d 1 d K d 1 dμ(d ) NE Pr(Y d | X d ) Probability 1-e ( -xd d/E)( E/K ) 1-e -xd d/K d K K j-xd d K-j Pr(Y d | X d ) Pr(Y d|X d ) pe (1-p ) d j 0 j d-1 Optimization Problem: trade-off between coding performance and communication overhead Pr(Y d | X Kd ) Pr(Y d | X d ) minimize x dμ K(d) subject d (1-p ) K-d Pr(Y dd| X d ) δd d 1 xd 1 eK d - xd d ( K-d ) K d -x d e K e d for d 1d, ..., K/R d 20 Persistent Data Access Exact Decentralized Fountain Codes Solve the optimization problem by MATLAB Parameter Setting δ d (constraints of violation probabilities) = 0.05 N (the number of total nodes) = 2000 K (the number of sensing nodes) = 1000 c = 0.01, δ = 0.05 Further numerical computation overhead ratio = 1.4508 21 Persistent Data Access Approximate Decentralized Fountain Codes Design a new distribution υ() to be a hypothetical chosen degree distribution. attempt to avoid its redundant random walks N d 1 dυ(d ) K Number of random walks b Steady-state distribution of the random walks πd d N i 1 iυ(i) K K E i 1 iυ(i) K 22 Persistent Data Access Approximate Decentralized Fountain Codes p d N/K Pr(Yi 1| X d ) 1-(1) NE K Pr(Y d' ) Pr( X d )Pr(Y d'|X d ) d 1 K υ(d ) p d' (1-p ) K-d' d 1 d' actual degree distribution of a node K υ' () Optimization Problem: K/R minimize (υ' (i) - μ(i)) j 1 2 minimize the mean-square error between υ' () and () K subject to υ(i) 1 j 1 υ(i ) 0 for i 1, ..., K . 23 Persistent Data Access Approximate Decentralized Fountain Codes The steps of ADFC are: Step 1. Degree generation. from the chosen degree distribution υ() Step 2. Compute steady-state distribution. π d Step 3. Compute probabilistic forwarding table. by the Metropolis algorithm Step 4. Compute the number of random walks. b: number of random walks Step 5. Block dissemination. based on the probabilistic forwarding table Step 6. Encoding. by bitwise XOR of all received source blocks 24 The source node IDs are attached in the encoded block! Persistent Data Access Approximate Decentralized Fountain Codes Overhead ratio of ADFC b: the number of random walks in ADFC b0: the number of random walks in the ideal algorithm g2 b b0 K d 1 K d 1 dυ(d ) dμ(d ) By further numerical computation The overhead ratio g 2 is only 0.2326. Less transmission cost is required. But… 25 Persistent Data Access Approximate Decentralized Fountain Codes Parameter Setting N (number of total nodes) = 2000 K (number of sensing nodes) = 1000 c = 0.01, δ = 0.05 Robust Soliton distribution chosen degree distribution υ() actual degree distribution inaccuracy! 26 Discussion of Multiple Encoded Blocks Source Blocks Cache Node Source Blocks Encoded Blocks …… Source Blocks may lose some information… Sensing Nodes Does it improve the coding performance if different encoded blocks are maintained? 27 Discussion of Multiple Encoded Blocks Theorem 2 When the code-degree distribution conforms to the Robust Soliton distribution, even if the source blocks on each node are not encoded, the collector must visit Ω(K ) nodes in order to collect all source blocks with probability 1 - . is a small positive number. Yi,j is a random variable that assumes the value 1 if the source block j is collected when visiting ith node. K Pr(Yi,j 1) Pr( X i d )Pr(Yi,j 1| X i d ) d 1 K d c1 ln (K/δ ) μ(d ) K K average degree of an d 1 encoded block [3] 28 Discussion of Multiple Encoded Blocks Z j has value 1 if source block j is collected after visiting M nodes. Pr( Z j 0) i 1 Pr(Yi,j 0) i 1 (1- Pr (Yi,j 1)) M M c1 ln ( K/δ ) M ) K E denote the event that all blocks are collected after visiting M nodes. c ln ( K/δ ) M K K Pr( E ) j 1 Pr( Z j 1) (1-(1- 1 ) ) K All blocks are collected with probability 1 - (1- (1-(1- c1 ln ( K/δ ) M K ) ) 1-δ K 29 Discussion of Multiple Encoded Blocks Apply logarithm to both sides K ln (1-(1- -(1 c1 ln ( K/δ ) M ) ) ln (1-δ ) -δ K c1 ln ( K/δ ) M ) -δ/K K By using similar approximation, we obtain M K/c1 i.e., M (K ) The collector needs to visit (K ) nodes to collect all K source blocks. 30 Performance Evaluation We implement both the original centralized and the decentralized implementation of fountain codes. Centralized implementation of fountain codes To evaluate the effectiveness and performance about 1000 lines of C++ code Optimized implementation of encoding and decoding algorithms. Decentralized implementation of fountain codes also simulated in C++ 31 Performance Evaluation Use two-dimensional Geometric Random Graph as the topological model. N sensors are uniformly distributed on a unit disk K sensing nodes are uniformly distributed among the N sensors. Radio range: r We set K=10000, N=20000, and r=0.033 in most experiments. The average number of neighbors for each node is 21. 32 Performance Evaluation Communication Cost and Decoding Ratio Two main performance metrics Communication Cost Communication Cost and Decoding Ratio the length of random walks the number of random walks Decoding Ratio fault tolerance! number of nodes need to be visited by a collector for decoding Normalized by the number of sensing nodes. 33 Performance Evaluation Communication Cost and Decoding Ratio The impact of the length of random walks on decoding ratio. 1.05 50 500 Each Data Point: the average and the 95% confidence interval from 10 experiments 34 Performance Evaluation Communication Cost and Decoding Ratio The ratio of dissemination costs of EDFC and ADFC to that of the two-way algorithm. 0.8 0.2 35 Performance Evaluation Multiple Encoded Blocks Cannot Do Better Theorem 2: Keeping multiple encoded blocks on each node does not offer any asymptotic performance advantage over keeping a single encoded block. The number of nodes to be visited before collecting all source blocks. The collector needs to visit close to K nodes even if the source blocks are not encoded. 36 Performance Evaluation Overestimation of K and N The failure of sensors are common events. in large-scale sensor networks It is not feasible to update K and N to all nodes in the network whenever they change. Update K and N periodically. Each node may overestimate K and N. 37 Performance Evaluation Overestimation of K and N The consequence of overestimating N: N: the number of total nodes Actual N = 20000. 1.05 38 Performance Evaluation Overestimation of K and N The impact of overestimating K: K: the number of sensing nodes Estimated K = 10000. EDFC is more robust! 39 Conclusion In this paper, we seek to improve the fault tolerance and data persistence in sensor networks. Superior decoding performance and low decoding complexity of fountain codes. decentralized implementation of fountain codes disseminate original data throughout the network with random walks as the number of nodes scales up The proposed algorithms are able to provide nearoptimal fault tolerance. with minimal demand on local storage 40
© Copyright 2026 Paperzz