Self Stabilizing Spanning Tree Algorithm

CS4231
Parallel and Distributed Algorithms
AY 2006/2007 Semester 2
Lecture 10
Instructor: Haifeng YU
Review of Last Lecture
System/Failure Model
Ver 0: No node or link failures
Consensus Protocol
Trivial: all-to-all broadcast
Ver 1: Node crash failures; Channels (f+1)-round protocol can
are reliable; Synchronous;
tolerate f crash failures
Ver 2: No node failures; Channels
may drop messages (the
coordinated attack problem)
Impossible without error
Randomized algorithm with 1/r
error prob
Ver 3: Node crash failures; Channels Impossible (the FLP theorem)
are reliable; Asynchronous;
Ver 4: Node Byzantine failures;
If n ≤ 3f, impossible.
Channels are reliable; Synchronous; If n ≥ 4f + 1, we have a (2f+2)(the Byzantine Generals problem)
round protocol.
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
2
Today’s Roadmap
 Chapter 18 “Self-Stabilization”
 Formalizing the notion of self-stabilization
 A toy problem: “Rotating Privilege on A Ring”
 The very first self-stabilization algorithm
 More for theoretical interest
 A practical problem: “Self-Stabilizing Spanning Tree
Construction”
 Very useful in multicast (BitTorrent-style data streaming)
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
3
Motivation for Self-Stabilization
 (Motivation on the book
is not very practical)
A multicast tree: Each node records
who its parent and children are
 Distributed systems can
get into illegal state due
to
 Topology changes
 Failures / reboots
 Malicious processes
 Generally called
“faults”
parent value of these two
nodes no longer valid
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
4
Motivation for Self-Stabilization
 Distributed systems
can get into illegal
state due to
Mobile ad hoc networks:
Maintaining the shortest route back
to sink
 Topology changes
 Failures / reboots
 Malicious processes
 Generally called
“faults”
A
A should now have an
improved route back to sink
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
5
Defining Self-Stabilization
 The state (i.e., data state in all processes) of a distributed system is
either legal or illegal
 Definition based on application semantics
 The code on each process is assumed to be correct all the time
 A distributed algorithm is self-stabilizing if
 Starting from any (legal or illegal) state, the protocol will eventually reach
a legal state if there are no more faults
 Once the system is in a legal state, it will only transit to other legal states
unless there are faults
 Intuitively, will always recover from faults and once recovered, will
stay recovered forever
 Self-stabilizing algorithm typically runs in background and never stops
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
6
The Rotating Privilege Problem
 A ring of n processes, each process can only
communicate with neighbors
 There is a privilege in the system
 At any time, only one node may have the privilege (you can
think of this as a token)
 The node with the privilege may for example, have exclusive
access to some resource
 The privilege needs to “rotate” among the nodes so that
each node has a chance
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
7
The Rotating Privilege Algorithm
 Each process i has a local integer variable V_i
 0  V_i  k where k is some constant no smaller than n
12
3
Example: n = 5 and k = 12
0
9
12
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
8
Red process’s action:
Retrieve value L of my clockwise neighbor;
Let V be my value;
if (L == V) { // I have the privilege
// complete whatever I want to do;
Each process executes
each action repeatedly
– we will assume each
action happens
instantaneously (for this
algorithm only)
V = (V+1) % k;
12
}
3
Green process’s action:
Retrieve value L of my clockwise neighbor;
Let V be my value;
if (L != V) { // I have the privilege
0
9
// complete whatever I want to do;
V = L;
12
}
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
9
0
0
0
0
0
1
1
0
0
1
1
2
1
1
1
1
2
2
1
2
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
10
What’s Interesting about the Algorithm
 This problem is mainly for theoretical interests
 What is interesting about it:
 Regardless of the initial values of the processes, eventually
the system will get into a legal state and stay in legal states
 Self-stabilizing!
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
11
Legal States
 We say that a process makes a “move” if it has the privilege and
changes its value
 System in legal state if exactly one machine can make a move
 Easy to prove that in any state, at least one machine can make move
 Lemma: The following are legal states and are the only legal state
 All n values same OR
 Only two different values forming two consecutive bands, and one band
starts from the red process
 To prove these are the only legal states, consider the value V of the
red process and the value L of its clockwise neighbor
 Case I: V=L
 Case II: VL
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
12
Legal States  Legal States
 Theorem: If the system is in a legal state, then it will stay
in legal states
 Our assumption on instantaneous actions will simplify this proof
 We can consider actions one by one
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
13
Illegal States  Legal States
 Lemma: Let P be a green process, and let Q be P’s clockwise
neighbor. If Q makes i moves, then P can make at most i+1
move.
 Lemma: Let Q be the red process. If Q makes i moves, then
system-wide there can be at most the following number of
moves:
i  (i  1) (i  2)...  (i  n  1)
(n  1)( n  2)
 ni 
2
 Lemma: Let Q be the red process, and consider a sequence of
n^2 moves in the system. Q makes at least one move in the
sequence
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
14
Illegal States  Legal States
 Lemma 1: Regardless of the starting state, the system
eventually reach a state T where the red process has a different
value from all other process (though the system may not stay in
such states)
 Proof: Let Q be the red process. If in the starting state Q has the
same value as some other process, then there must be an integer j
(0  j  k-1) that is not the value of any process. Q will eventually
take j as its value.
 (It takes Q any most n moves to do so.)
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
15
Illegal States  Legal States
 Lemma 2: If the system is in a state T where the red process
has a different value from all other process, then the system will
eventually each a state where all processes have the same
value (though the system make not stay in such states)
 Theorem: Regardless of the initial states of the system, the
system will eventually reach a legal state.
 Proof: From Lemma 1 and Lemma 2.
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
16
Self Stabilizing Spanning Tree Algorithm
 Given n processes
connected by an undirected
graph and one special
process P1, construct a
spanning tree rooted at P1
P1
 Not all processes can
communicated with all
processes directly
 A very useful / practical
algorithm
 Can also be used to compute
shortest path
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
17
Self Stabilizing Spanning Tree Algorithm
 Each process maintains
two variables
P1
 parent: Who my parent is
 dist: My distance to root
 Runs in the background
 parent and dist are
continuously updated
 At any given point of time,
the values of the two
variables can be wrong
 Due to “faults” such as
topology change resulted
from node movement
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
18
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(3, P5)
P1
(8, P4)
P6
(9, P3)
(1, P1)
P2
(2, P3)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
19
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(8, P4)
P6
(9, P3)
(1, P1)
P2
(2, P3)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
20
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(1, P1)
P6
(1, P1)
(1, P1)
P2
(1, P1)
P5
(0, P8)
P3
P7
(5, P7)
(6, P7)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
21
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(1, P1)
P6
(1, P1)
(1, P1)
P2
(1, P1)
P5
(0, P8)
P3
P7
(1, P7)
(2, P6)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
22
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(1, P1)
P6
(1, P1)
(1, P1)
P2
(1, P1)
P5
(2, P4)
P3
P7
(1, P7)
(2, P6)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
23
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(1, P1)
P6
(1, P1)
(1, P1)
P2
(1, P1)
P5
(2, P4)
P3
P7
(2, P5)
(2, P6)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
24
Self Stabilizing Spanning Tree Algorithm
Red values are initially incorrect values;
 On P1 (executed
periodically):
Green values are values that have
become correct
 dist = 0; parent = -1;
 On all other processes
(executed periodically):
 Retrieve dist from all
neighbors
 Set my own dist = 1 + (the
smallest dist received)
 Set my own parent = my
neighbor with the smallest
dist (tie break if needed)
(0, -1)
P1
(1, P1)
P6
(1, P1)
(3, P4)
P2
(1, P1)
P5
(3, P4)
P3
P7
(2, P5)
(2, P6)
P4
P8
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
25
Correctness Proof
 Define a phase to be the minimum time period where
each process has executed its code at least once
(called “has taken an action”)
 Some process may execute its code more than once
 The definition of a phase here is different from a round in
synchronous systems !
 Let A_i to be the length of the shortest path from
process i to the root, let dist_i to be the value of dist
on process i
 dist_i is not allowed to be negative
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
26
Correctness Proof
 Lemma: At the end of phase 1, dist_1 = 0 and dist_i 
1 for any i  2
 Lemma: At the end of phase 2,
 Any process i whose A_i = 0, we have dist_i = 0;
 Any process i whose A_i = 1, we have dist_i = 1;
 Any process i whose A_i  2, we have dist_i  2;
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
27
Correctness Proof
 Lemma: At the end of phase r,
 Any process i whose A_i  r-1, we have dist_i = A_i;
 Any process i whose A_i  r, we have dist_i  r;
 Prove by induction: assume the lemma holds at
phase r, now consider phase r+1, we need to prove
 Any process i whose A_i  r-1, we have dist_i = A_i;
 Any process i whose A_i = r, we have dist_i = A_i;
 Any process i whose A_i  r+1, we have dist_i  r+1;
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
28
Correctness Proof
 Consider all t actions taken during phase r+1
 We will use an induction on t
 This proof is tricky if this is your first self-stabilization proof
 A process may take multiple actions in a phase !
 Processes may take actions in parallel – cannot assume a
serialization of all actions !
 The proof technique is typical for proving self-stabilization
 Step 1: Prove that the t actions will not roll back what is already
achieved so far (no backward move)
 Step 2: Prove that at some point, each node will achieve more
(forward move)
 Step 3: Prove that the t actions will not roll back the effects of the
forward move (no backward move after the forward move)
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
29
Step 1: The t actions will not change the green conditions
Proof: Induction on t and consider action (t+1) by some process. (Cannot
assume action (t+1) happens after the t actions.) Regardless of what values
the process draws from its neighbors, the action will not end up violating the
condition.
for nodes
with
A_i  r-1
already
know:
want to
show:
phase r
phase r+1
……
A_i  r - 1
dist_i = A_i dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
A_i  r - 1
30
Step 1: The t actions will not change the green conditions satisfied at the
beginning of phase r
Proof (continued): True because a level A_i process only have neighbors
from level A_i – 1, A_i, and A_i + 1.
for nodes
with
A_i  r-1
already
know:
want to
show:
phase r
phase r+1
……
A_i  r - 1
dist_i = A_i dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
A_i  r - 1
31
Step 2: For each process, at some point during phase r+1, it will satisfy the
red conditions
Proof: By definition of a phase, each process will take at least one action
during phase r+1
for nodes
with
A_i  r-1
already
know:
want to
show:
phase r
phase r+1
……
A_i  r - 1
dist_i = A_i dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
A_i  r - 1
32
Step 3: For each process, after it first satisfies the red condition, it will
continue to satisfy the red condition for the remainder of the phase
Proof: Trivial – but do need to enumerate three cases
for nodes
with
A_i  r-1
already
know:
want to
show:
phase r
phase r+1
……
A_i  r - 1
dist_i = A_i dist_i = A_i
A_i = r
A_i = r
dist_i  r
dist_i = A_i
A_i  r+1
dist_i  r
dist_i  r+1
……
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
A_i  r - 1
33
Correctness Proof
 Theorem: After H rounds, A_i = dist_i on all
processes
 H being the length of the shortest path from the most far
away process to the root
 Theorem: After H rounds, the dist and parent values
on all processes are correct
 Proof: Each process has a single parent pointer except the
root. So the graph has n nodes and n-1 edges. Each
process has a path to the root, thus the graph is connected.
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
34
Homework Assignment
 For the “rotating privilege algorithm”: Consider a ring
with 4 processes, and let k = 2. In other words, each
process may have a value of 0 or 1. Construct a
scenario where the algorithm will not stabilize
 Think about: For the self-stabilizing tree algorithm,
assume now that the system is synchronous, prove that
the algorithm is self-stabilizing.
CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 2
35