ppt

UBE529
Distributed Algorithms
Self Stabilization
Self-Stabilization
Formalizing the notion of self-stabilization
A toy problem: “Rotating Privilege on A Ring”
The very first self-stabilization algorithm
More for theoretical interest


A practical problem: “Self-Stabilizing Spanning Tree Construction”
Very useful in multicast (BitTorrent-style data streaming)

2
Introduction
Self-stabilization: Tolerate ‘data faults’

Example: Parent pointers in a spanning tree getting corrupted
Assume that the code does not get corrupted
System state: legal or illegal
Faults may result in an illegal system state
Self-Stabilizing system: Irrespective of the initial state always
reaches a legal state in finite time
3
Motivation for Self-Stabilization
(Motivation on the book is not
very practical)
A multicast tree: Each node records
who its parent and children are
Distributed systems can get
into illegal state due to




Topology changes
Failures / reboots
Malicious processes
Generally called “faults”
parent value of these two
nodes no longer valid
4
Motivation for Self-Stabilization
Distributed systems can get
into illegal state due to
Topology changes
Failures / reboots
Malicious processes
Generally called “faults”

Mobile ad hoc networks:
Maintaining the shortest route back
to sink



A
A should now have an
improved route back to sink
5
Defining Self-Stabilization
The state (i.e., data state in all processes) of a distributed system is either legal or
illegal


Definition based on application semantics
The code on each process is assumed to be correct all the time
A distributed algorithm is self-stabilizing if

Starting from any (legal or illegal) state, the protocol will eventually reach a legal
state if there are no more faults

Once the system is in a legal state, it will only transit to other legal states unless
there are faults
Intuitively, will always recover from faults and once recovered, will stay recovered
forever
Self-stabilizing algorithm typically runs in background and never stops
6
Mutual Exclusion
Legal state: Exactly one machine in the system is ‘privileged’
Assume there are N machines 0 … N-1
Each machine is a K-State machine
Label the possible states from the set {0…K-1}

There is one special machine called the bottom machine
L, S, R = States of left machine, self, right machine respectively
7
Algorithm
Bottom: Privileged if L=S
Other machines: Privileged if L  S
8
Algorithm: A move by bottom machine
9
Algorithm: A move by a normal machine
10
Another Example
11
Implementation
Each process needs to query its left neighbor
Instead of periodic queries use a TOKEN for message efficiency
What if the token gets lost ?
Bottom machine maintains a timer
If it does not receive a token for a long time it regenerates the
token
Multiple tokens do not affect the correctness of the algorithm



12
//Program for the bottom node
public class StableBottom extends Process implements Lock {
int myState = 0;
int leftState = 0;
int next;
Timer t = new Timer();
boolean tokenSent = false;
public StableBottom(Linker initComm) {
super(initComm);
next = (myId + 1) % N;
}
public synchronized void initiate() {
t.schedule(new RestartTask(this), 1000, 1000);
}
public synchronized void requestCS() {
while (leftState != myState) myWait();
}
public synchronized void releaseCS() {
myState = (leftState + 1) % N;
}
public synchronized void sendToken() {
if (!tokenSent) {
sendMsg(next, "token", myState);
tokenSent = true;
} else tokenSent = false;
}
public synchronized void handleMsg(Message m, int src, String tag) {
if (tag.equals("token") )
{
leftState = m.getMessageInt();
notify();
Util.mySleep(1000);
sendMsg(next, "token", myState);
tokenSent = true;
} else if (tag.equals("restart") )
sendToken()
}
}
13
//Program for a normal node
public class StableNormal extends Process implements Lock {
int myState = 0;
int leftState = 0;
public StableNormal(Linker initComm) {
super(initComm);
}
public synchronized void requestCS() {
while (leftState == myState) myWait();
}
public synchronized void releaseCS() {
myState = leftState;
sendToken();
}
public synchronized void sendToken() {
int next = (myId + 1) % N;
sendMsg(next, "token", myState);
}
public synchronized void handleMsg(Message m, int src, String tag) {
if (tag.equals("token")) {
leftState = m.getMessageInt();
notify();
Util.mySleep(1000);
sendToken();
}
}
}
14
Diskstra’s 2nd Algorithm for Mutual Exclusion
Bottom :
if (B + 1 = R) then B := B + 2 ;
Normal :
if (L=S+1) or (R=S+1) then S := S+1;
Top :
if (L=B) and (T!=B+1) then T :=B+1;
•
•
3 states per machine {0,1,2}
An array of processors
15
Second Alg. Example
16
Use of First Alg : The Rotating Privilege Problem
A ring of n processes, each process can only communicate with
neighbors
There is a privilege in the system
At any time, only one node may have the privilege (you can think of
this as a token)
The node with the privilege may for example, have exclusive access
to some resource
The privilege needs to “rotate” among the nodes so that each node
has a chance



17
The Rotating Privilege Algorithm
Each process i has a local integer variable V_i
0  V_i  k where k is some constant no smaller than n

12
3
Example: n = 5 and k = 12
0
9
12
18
Red process’s action:
Retrieve value L of my clockwise neighbor;
Let V be my value;
if (L == V) { // I have the privilege
// complete whatever I want to do;
Each process executes
each action repeatedly
– we will assume each
action happens
instantaneously (for this
algorithm only)
V = (V+1) % k;
12
}
3
Green process’s action:
Retrieve value L of my clockwise neighbor;
Let V be my value;
if (L != V) { // I have the privilege
0
9
// complete whatever I want to do;
V = L;
12
}
19
0
0
0
0
0
1
1
0
0
1
1
2
1
1
1
1
2
2
1
2
20
What’s Interesting about the Algorithm
This problem is mainly for theoretical interests
What is interesting about it:
Regardless of the initial values of the processes, eventually the
system will get into a legal state and stay in legal states
Self-stabilizing!


21
Legal States
We say that a process makes a “move” if it has the privilege and changes its value
System in legal state if exactly one machine can make a move

Easy to prove that in any state, at least one machine can make move
Lemma: The following are legal states and are the only legal state


All n values same OR
Only two different values forming two consecutive bands, and one band starts
from the red process
To prove these are the only legal states, consider the value V of the red process
and the value L of its clockwise neighbor


Case I: V=L
Case II: VL
22
Legal States  Legal States
Theorem: If the system is in a legal state, then it will stay in legal states
Our assumption on instantaneous actions will simplify this proof
We can consider actions one by one


23
Illegal States  Legal States
Lemma: Let P be a green process, and let Q be P’s clockwise neighbor. If Q
makes i moves, then P can make at most i+1 move.
Lemma: Let Q be the red process. If Q makes i moves, then system-wide there
can be at most the following number of moves:
i  (i  1) (i  2)...  (i  n  1)
(n  1)( n  2)
 ni 
2
Lemma: Let Q be the red process, and consider a sequence of n^2 moves in the
system. Q makes at least one move in the sequence
24
Illegal States  Legal States
Lemma 1: Regardless of the starting state, the system eventually reach a state
T where the red process has a different value from all other process (though
the system may not stay in such states)

Proof: Let Q be the red process. If in the starting state Q has the same
value as some other process, then there must be an integer j (0  j  k-1)
that is not the value of any process. Q will eventually take j as its value.

(It takes Q any most n moves to do so.)
25
Illegal States  Legal States
Lemma 2: If the system is in a state T where the red process has a different
value from all other process, then the system will eventually each a state where
all processes have the same value (though the system make not stay in such
states)
Theorem: Regardless of the initial states of the system, the system will
eventually reach a legal state.

Proof: From Lemma 1 and Lemma 2.
26
Self-stabilizing Dominating Partition (Hedetniemi)
R1 : if x(i) = 0  ( j E N(i)) (x(j) = 0)
then x(i) = 1
R2 : if x(i) = 1  ( j E N(i)) (x(j) = 1)
then x(i) = 0
27
Hedetniemi Example
All transformations are by R1
28
Hedetniemi MIS Algorithm
R1 : if s(i) = 0  ( j E N(i)) (s(j) = 0)
then s(i) = 1
R2 : if s(i) = 1  (There exists j E N(i)) (s(j) = 1)
then s(i) = 0
29
Self Stabilizing Spanning Tree Algorithm
Given n processes connected by an
undirected graph and one special
process P1, construct a spanning tree
rooted at P1
Not all processes can communicated
with all processes directly
P1

A very useful / practical algorithm
Can also be used to compute
shortest path

30
Self Stabilizing Spanning Tree Algorithm
Each process maintains two variables
parent: Who my parent is
dist: My distance to root
Runs in the background
parent and dist are continuously
updated
At any given point of time, the values
of the two variables can be wrong
Due to “faults” such as topology
change resulted from node
movement

P1



31
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(3, P5)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(9, P3)
Set my own dist = 1 + (the
smallest dist received)
(8, P4)
(1, P1)
P2
(2, P3)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
32
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(9, P3)
Set my own dist = 1 + (the
smallest dist received)
(8, P4)
(1, P1)
P2
(2, P3)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
33
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(1, P1)
Set my own dist = 1 + (the
smallest dist received)
(1, P1)
(1, P1)
P2
(1, P1)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
34
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(1, P1)
Set my own dist = 1 + (the
smallest dist received)
(1, P1)
(1, P1)
P2
(1, P1)
P5
(0, P8)
P3
P7
(1, P7)
(2, P6)
P4
P8
35
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(1, P1)
Set my own dist = 1 + (the
smallest dist received)
(1, P1)
(1, P1)
P2
(1, P1)
P5
(2, P4)
P3
P7
(1, P7)
(2, P6)
P4
P8
36
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(1, P1)
Set my own dist = 1 + (the
smallest dist received)
(1, P1)
(1, P1)
P2
(1, P1)
P5
(2, P4)
P3
P7
(2, P5)
(2, P6)
P4
P8
37
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
On P1 (executed periodically):

dist = 0; parent = -1;
Green values are values that have
become correct
(0, -1)
P1
On all other processes (executed
periodically):


Retrieve dist from all neighbors

Set my own parent = my
neighbor with the smallest dist
(tie break if needed)
P6
(1, P1)
Set my own dist = 1 + (the
smallest dist received)
(1, P1)
(3, P4)
P2
(1, P1)
P5
(3, P4)
P3
P7
(2, P5)
(2, P6)
P4
P8
38
Self-stabilizing spanning tree
Maintain a spanning tree rooted at the ‘root’ node
A data fault may corrupt the ‘parent’ pointer at any node
Recalculate parent pointers regularly
39
Algorithm
dist maintains the distance of a node from the root
40
Algorithm
The root periodically sets parent to -1(null) and dist to 0
A non-root reads dist from all neighbors and points its parent to the
node with the least distance from the root
41
Correctness Proof
Define a phase to be the minimum time period where each process has
executed its code at least once (called “has taken an action”)
Some process may execute its code more than once
The definition of a phase here is different from a round in
synchronous systems !


Let A_i to be the length of the shortest path from process i to the
root, let dist_i to be the value of dist on process i
dist_i is not allowed to be negative

42
Correctness Proof
Lemma: At the end of phase 1, dist_1 = 0 and dist_i  1 for any i  2
Lemma: At the end of phase 2,
Any process i whose A_i = 0, we have dist_i = 0;
Any process i whose A_i = 1, we have dist_i = 1;
Any process i whose A_i  2, we have dist_i  2;



43
Correctness Proof
Lemma: At the end of phase r,
Any process i whose A_i  r-1, we have dist_i = A_i;
Any process i whose A_i  r, we have dist_i  r;


Prove by induction: assume the lemma holds at phase r, now consider
phase r+1, we need to prove
Any process i whose A_i  r-1, we have dist_i = A_i;
Any process i whose A_i = r, we have dist_i = A_i;
Any process i whose A_i  r+1, we have dist_i  r+1;



44
Correctness Proof
Consider all t actions taken during phase r+1

We will use an induction on t
This proof is tricky if this is your first self-stabilization proof


A process may take multiple actions in a phase !
Processes may take actions in parallel – cannot assume a serialization of all
actions !
The proof technique is typical for proving self-stabilization

Step 1: Prove that the t actions will not roll back what is already achieved
so far (no backward move)

Step 2: Prove that at some point, each node will achieve more (forward
move)

Step 3: Prove that the t actions will not roll back the effects of the
forward move (no backward move after the forward move)
45
Step 1: The t actions will not change the green conditions
Proof: Induction on t and consider action (t+1) by some process. (Cannot
assume action (t+1) happens after the t actions.) Regardless of what values
the process draws from its neighbors, the action will not end up violating the
condition.
for nodes
already
want to
with
know:
show:
phase r
phase r+1
A_i  r-1
dist_i = A_i
……
A_i  r - 1
dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
A_i  r - 1
46
Step 1: The t actions will not change the green conditions satisfied at the
beginning of phase r
Proof (continued): True because a level A_i process only have neighbors
from level A_i – 1, A_i, and A_i + 1.
for nodes
already
want to
with
know:
show:
phase r
phase r+1
A_i  r-1
dist_i = A_i
……
A_i  r - 1
dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
A_i  r - 1
47
Step 2: For each process, at some point during phase r+1, it will satisfy the
red conditions
Proof: By definition of a phase, each process will take at least one action
during phase r+1
for nodes
already
want to
with
know:
show:
phase r
phase r+1
A_i  r-1
dist_i = A_i
……
A_i  r - 1
dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
A_i  r - 1
48
Step 3: For each process, after it first satisfies the red condition, it will
continue to satisfy the red condition for the remainder of the phase
Proof: Trivial – but do need to enumerate three cases
for nodes
already
want to
with
know:
show:
phase r
phase r+1
A_i  r-1
dist_i = A_i
……
A_i  r - 1
dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
A_i  r - 1
49
Correctness Proof
Theorem: After H rounds, A_i = dist_i on all processes
H being the length of the shortest path from the most far away
process to the root

Theorem: After H rounds, the dist and parent values on all processes
are correct
Proof: Each process has a single parent pointer except the root. So
the graph has n nodes and n-1 edges. Each process has a path to the
root, thus the graph is connected.

50
Acknowledgements
This part is heavily dependent on the course : CS4231 Parallel and
Distributed Algorithms, NUS by Dr. Haifeng Yu and Vijay Hargh
Elements of Distributed Computing Book.
51