Self Stabilizing Distributed File System

Self Stabilizing
Distributed File System
Department of Computer Science, Ben-Gurion University
A BGU – IBM joint project
DFS Motivation
• Performance.
• Fault tolerance, any server can
take responsibility for any role.
• Place files closer to users (local
file access).
What Is Self-stabilizing?
A self-stabilizing system is a
system that can automatically
recover following the
occurrence of (transient) faults.
The idea is to design system that
can be started in an arbitrary
state and still converge to a
desired behavior.
Self-Stabilization/S. Dolev
Self Stabilization
Motivation
• The combination and type of faults
cannot be totally anticipated in ongoing systems.
• Any on-going system must be Self
stabilizing (or manually monitored).
• Self-stabilizing algorithm can recover
from any arbitrary state reached due
to the occurrence of faults.
Design
• File system replication servers
are coordinated using a
spanning tree.
• Tree is constructed by selfstabilizing update algorithm
using multicast messages.
• Updates are propagated using
self-stabilizing -synchronizer.
Design (Cont’)
• Clients join the replication tree
and forms a caching tree.
• File leases are used to provide
cache consistency.
Replication Tree
• Using a layered self-stabilizing
algorithm, we construct a single
spanning tree consisting the file
system servers.
Leader Election
• A single leader coordinates the
construction of the spanning tree.
• If no leader exists, a server becomes
a leader.
• If more than one leader exist, the
server with the minimal ID survives
• Message are periodical sent using
global multicast (or broadcast).
Leader Election Algorithm
• Every T1 do:
– If (p = leader) then send-multicast(‘I’m a leader’)
– Leader-exists = true
• Every T1+Td do:
– If (not leader-exists) then leader = p
– Leader-exists = false
• Upon arrival of message do:
– If (p.volume=volume) then
• If (p=leader) then leader = min(leader,sender)
• Else leader = sender
– Leader-exists = true
Spanning Tree Construction
• A network version of the selfstabilizing update algorithm.
• Multicast messages with a
limited -local TTL.
• Define Neighboring relation for
the update algorithm.
• Keep the communication graph
connected.
Induced Graph Example
Update Algorithm
• Collect routing tables from all
neighbors in the induced graph.
• Build a distributed BFS spanning
tree from the tables.
• Select a manager (local leader)
for the tree, a server with the
minimal ID.
Tree Optimization
• Update algorithm creates connected
components for the communication
graph that is induced by the  radius.
• Goal: Find the minimal  radius that
keeps connectivity.
• Increase  by a factor of 2 until a
single component spans the system.
• Run a 2nd instance of update with <
radius and compare outputs, if the
same, decrease .
• Search for  using binary search.
Tree Structure
Replication Consistency
• A self-stabilizing -synchronizer
verifies that the signatures of
accessed files are identical in all
servers.
• If more than a single signature exist
then there is a conflict.
• The leader decides (user defined
algorithm) on the correct file content
and notifies the servers.
Caching Tree
• Clients extends the replication
tree to a caching tree.
• The same update algorithm
construct both replication and
caching tree (minor modification
are required).
Cache Tree Diagram
File Access
• Read request is sent to the tree
parent (either a server or cache).
• Write request travels to the
replication tree root (leader) and
propagates by the -synchronizer.
• Caching consistency depends on the
propagation mechanism.
Read/Write Example
Linux Based bguFS (1)
Application
SyncDaemon:
Cache manager
& Server
User Level
Kernel Level
Network
Communication
Upcalls
Cache: valid data?
bguFS
Module
VFS
Local file system
Kernel update
Updates
Linux Based bguFS (2)
SyncDaemon:
Cache manager
& Server
Application
User Level
Linux libc library
Library File
Commands
New implementation for
“C” commands:
fopen, fclose, fread,
fwrite, etc …
Network
Communication
Upcalls
Tasks
• Leader election and a radius
based spanning tree.
• Optimal radius (binary) search
and beta-synchronizer.
• Distributed file R/W (operations)
implementation.
• Kernel VFS module (1).
• C library “hacking” solution (2).