Motivating Self

Chapter 3
Self-Stabilization
Self-Stabilization
Shlomi Dolev
MIT Press , 2000
Shlomi Dolev, All Rights Reserved ©
Chapter 3 - Motivating Self-Stabilization
3-1
Chapter 3: Motivating Self-Stabilization
 Converging to a desired behavior from any initial
state enables the algorithm to converge from an
arbitrary state caused by faults
 Why should one have interest in self-stabilizing
algorithms?
 Its applicability to distributed systems
 Recovering from faults of a space shuttle. Faults
may cause malfunction for a while. Using a selfstabilizing algorithm for its control will cause an
automatically recovery, and enables the shuttle
continue in its task
Chapter 3 - Motivating Self-Stabilization
3-2
What is a Self-Stabilizing Algorithm ?
 This question will be answered using the
“Stabilizing Orchestra” example
 The Problem:
 The conductor is unable to participate –
harmony is achieved by players listening to
their neighbor players
 Windy evening – the wind can turn some pages
in the score, and the players may not notice the
change
Chapter 3 - Motivating Self-Stabilization
3-3
The “Stabilizing Orchestra” Example
 Our Goal:
To guarantee that harmony is achieved at some
point following the last undesired page turn
 Imagine that the drummer notices a different
page of the violin next to him … (solutions and
their problems):
1. The drummer turns to its neighbors new page
– what if the violin player noticed the
difference as well ?
2. Both the drummer and violin player start from
the beginning
- what if the player next to the violin player
notices the change only after sync between
the other 2 ?
Chapter 3 - Motivating Self-Stabilization
3-4
The “Stabilizing Orchestra” Example –
the Self-Stabilizing Solution
 Every player will join the neighboring player who is
playing the earliest page (including himself)
 Note that the score has a bounded
length. What happens if a player
goes to the first page of the score
before harmony is achieved? This
case is discussed in details in
chapter 6.
 In every long enough period in
which the wind does not turn a
page, the orchestra resumes
playing in synchrony
Chapter 3 - Motivating Self-Stabilization
3-5
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the
Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization
3-6
The Data Link Algorithm
 The task of delivering a message is sophisticated,
and may cause message corruption or even loss
The layers involved:
The sender sends sequences
of
Network Layer
Network Layer
bits to the receiver
Data link Layer
Data link Layer
Head
Frame
Head Tail
Packet
Frame
Physical Layer
Packet
Tail Packet
Head
Tail
Frame
Physical Layer
Chapter 3 - Motivating Self-Stabilization
3-7
The alternating-bit algorithm
Is used to cope with possibility of frame corruption or loss
Sender
Receiver
01 initialization
01 initialization
02 begin
02 begin
03 i := 1
03 j := 1
04 bits := 0
04 bitr := 1
05 send(bits,imi) (*imi is fetched*)
05 end (*end initialization*)
06 end
(*end
initialization*)
Every message from the sender
is frame
repeatedly
sent in
06 upon
arrival
07 upon aatimeout
07 begin
frame
to
the
receiver
until
acknowledges arrives
08
send(bits,imi)
08
receive(FrameBit , msg)
09 upon frame arrival
09
if FrameBit  bitr then
10 begin
10
begin
11
receive(FrameBit)
11
bitr := FrameBit
12
if FrameBit = bits then acknowledgement
12
j := j + 1
13
begin
13
omj := msg
14
bits := (bits + 1) mod 2
14
end
15
i := i + 1
15
send(bitr) Send acknowledgement
16
end
16 end
17
send(bits,imi) (*imi is fetched*)
18 end
Chapter 3 - Motivating Self-Stabilization
3-8
The alternating-bit algorithm – run sample
S received ack.
R received
m21 …
R received
m1 again
Upon
aa timeout
Upon
timeout
… S
R received m1 again
.,1>
. . 2.12.,1>
.,0>
,1>
<m1.22.,0>
.. ...<m
.<m
.. 1.. ,0>
<m
,1>
<m
,1>
. .. .21.<m
.,1>
<m
...
2.,0>
R
bitRR==1010
bits = 10 <1>
<0>bit
<0>
. . ....<0>
.......... <0>
<0>
<0>.
<1>
Once the sender receives an acknowledgment <1>, no
frame with sequence number 0 exists in the system
Chapter 3 - Motivating Self-Stabilization
3-9
There Is No Data-link Algorithm that
can Tolerate Crashes
 It is usually assumed that a crash causes the
sender/receiver to reach an initial state
 No initialization procedure exists such that we can
guarantee that every message fetched by the sender,
following the last crash, will arrive at its destination
 The next Execution will demonstrate this point.
Denote:
 CrashR – receiver crash
 CrashS – sender crash
 CrashX causes X to perform an initialization procedure
Chapter 3 - Motivating Self-Stabilization
3-10
The Pumping Technique
Reference Execution (RE) = CrashS, CrashR, sendS(fs1),
receiveR(fs1), sendR(fr1), receiveS(fr1), sendS(fs2), … ,
receiveS(frk)
The idea : repeatedly crash the sender and the
receiver and to replay parts of the RE in order to
construct a new execution E’
these
frames are
information
S If
sends
fs1kreceives
fr1 , lost,
sendsnofs2
receives fr2, … ,
RWe
receives
fmessage
, sends
ffr1
, receives
fs2
, sends
fr2k), … ,
let
S
send
f
and
receive
f
(i
from
1
to
s1
Suppose
Crash
and
Crash
occurred
about
the
exists
in
the
system
R
receives
and
sends
f
crashes
si
ri
receives
f
S
sends
f
S
R
s1
r1
S sends
fs1 S
receives
fs1
sends
fs2 f
crashes
r(k-1)
sk sends
r1 and
R receives
fs1, sends
f
,
receives
f
and
R
crashes
r1
s2
r2
receives
fsk
and
sends frk
Now
S
and
R
crash
f
...
f
f
fsk s2fs1
...
ffs1s2 ffs1s1
sk
s2 s1
Crash
Crash
Crash
Crash
S
S
RR
Continue
with
the
same
technique
.
.
m2
m1
m2
S
m21
.....
R
fr1 ff f f ......f ff f
r2
r1
r1 r2r2
r(k-1)
rkr1
r1
Chapter 3 - Motivating Self-Stabilization
3-11
Conclusion !
 It is possible to show that there is no guarantee
that the kth message will be received
 We want to require that eventually every message
fetched by the sender reaches the receiver, thus
requiring a Self-Stabilizing Data-Link Algorithm
Chapter 3 - Motivating Self-Stabilization
3-12
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the
Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization
3-13
Arbitrary configuration because of
crashes
 A combination of crashes and frame losses can
bring a system to any arbitrary states of
processors and an arbitrary configuration
Chapter 3 - Motivating Self-Stabilization
3-14
Any Configuration Can be Reached by a
Sequence of Crashes
 The pumping technique is used to reach any
arbitrary configuration starting with the
reference execution
Reference Execution (RE) = CrashS, CrashR, sendS(fs1),
receiveR(fs1), sendR(fr1), receiveS(fr1), sendS(fs2), … ,
receiveS(frk)
 The technique is used to accumulate a long
sequence of frames
Chapter 3 - Motivating Self-Stabilization
3-15
Reaching an Arbitrary Configuration
 Our first goal – creating an execution in which RE
appears i times in a row (RE)i
Denote : FrE (FsE) – the sequence of frames
sent by the receiver (sender) in RE
FirE (FisE) = the sequence Fr(s)E Fr(s)E … Fr(s)E (i times)
First
we
the
Pumping
Technique
to
S
RS receives
sends
fthe
receives
receives
, fsends
S
sends
received
fi,use
, freceives
first
fr1,ffsends
F
,…be
crashed
For any
finite
can
extended
to
s1,technique
s1 sends
r1r1
s1the
rE,…
s2, receives
S,S
sends
fs1
S
crashes
S sends
receives
,
receives
f
and
fsends
,irksends
fr1f fs2
receive
RE
ff,s1sk
and
receives
sends
fr1
crashes
s1
fRr2
and
, …freceived
sends
f
the
second
reach a configuration
in
which
Ffrk
R
sk
sk, receives
rk, in qs,r
sE appears
.....
i, …
F
fs2sE
f
fs1
f
, fs2fthe
, fs1, same
fs1, FsE
F
F
s1
s1
s2
s2
s1
sk
sE
Crash
Continue sE
with
technique
Crash
S
SS
R
R
R
F
fr1rE
fr1r1rE fr2r2 ... frkfrkr1 FfrE
rE F
r1
Chapter 3 - Motivating Self-Stabilization
3-16
Reaching an Arbitrary Configuration
 Our second goal – achieving ca (an arbitrary
configuration)
 Denote k1 (k2)- the number of frames in qs,r (qr,s) in ca
 i = k1+k2+2
S We
replays
do the
RE same
using with
the first
R, reaching
FrE until
the
it reaches
arbitraryits
i
R (loosing
replays
RE
Using the
previous
technique
accumulate
F
desired
state
configuration
thekframes
ctimes
sent
by
it
and
2+1 we
sE
a
the leftovers of Fk2rE that are not in qr,s)
i
F k1+1
qs,r
sE sE
S'
S
R'
R
qr,s F k2+1rE
Chapter 3 - Motivating Self-Stabilization
3-17
Crash-Resilient Data-Link Algorithm,With a
Bound on the Number of Frames in Transit
 Crashes are not considered severe type of faults
(Byzantine are more severe - chapter 6)
 The algorithm uses the initialization procedure,
following the crashes of S and R
 bound – the maximal number of frames that can be
in transit
When the sender receives the first <ackClean,bound+1> it can be
sure that the only label in transit is bound+1, and can initialize the
S received
<ackClean,1>,
then
sends
repeatedly
alternating
bit
algorithm
(similarly
R
can
initialize
as well)
S crashes
S ,in after-crash state,
invokes a clean procedure
<clean,2> until it will receive <ackClean,2>
CrashSuntil
<clean
<m
.<clean
. .<clean
.S
. .,0>
,1>
<clean
,1>.
,2>.,bound+1>.
.. ..<clean
<clean
,1>
. <clean
. .<ackClean,bound+1>
.,1>
,1>
.,1>
.
Continue
receives
new
S
.....
R
bits = 0 <ackClean
<ackClean
<ackClean
,1>
,1>bitR = 1
,bound+1>
,1>
Chapter 3 - Motivating Self-Stabilization
3-18
Crash-Resilient Data-Link Algorithm – R
crashes
R received msg and assigned FrameBit to bitR it then
delivers msg to the output queue – The Problem :
extra copy of msg in
output queue
R the
crashes
<msg ,FrameBit> CrashR
<msg ,FrameBit>
S
R
bitR =
i
=FrameBit
Chapter 3 - Motivating Self-Stabilization
3-19
Crash-Resilient Data-Link Algorithm – R
crashes
Can we guarantee at most one delivery, and exactlyonce delivery after the last crash?
 bitR initialization should assure that a message
fetched after the crash will be delivered
 A solution:
 S sends each message in a frame with label 0,
until Ack. arrives and then sends the same
message with label 1 until an Ack. arrives
 R delivers a message only with label 1 that
arrives immediately after label 0
Chapter 3 - Motivating Self-Stabilization
3-20
Chapter 3: roadmap
3.1 Initialization of a Data-Link Algorithm in the
Presence of Faults
3.2 Arbitrary Configuration Because of Crashes
3.3 Frequently Asked Questions
Chapter 3 - Motivating Self-Stabilization
3-21
What is the Rational behind assuming that the
states of the processors can be corrupted while
the processors’ programs cannot ?
If the program is subjected to corruption, any
configuration is possible. The Byzantine model allows
1/3 of processors to execute corrupted programs
The program is stored in a long-term memory device
which makes it possible to
1. Reload program statements periodically
2. Protect the memory segment using a read-only
memory device
Chapter 3 - Motivating Self-Stabilization
3-22
Safety Properties
 Safety and Liveness properties should be satisfied
by a distributed algorithm
 Safety ensures avoiding bad configurations
 Liveness ensures achieving the systems’ goal
 The designer of a self-stabilizing algorithm wants to
ensure that even if the safety property is violated,
the system execution will reach a suffix in which
both properties hold
 What use is an algorithm that doesn’t ensure that a
car never crashes?
 If the faults are severe enough to make the
algorithm reach an arbitrary configuration, the
car may crash no matter what the algorithm is
chosen
Chapter 3 - Motivating Self-Stabilization
3-23
Safety Properties …
A safety property for a car controller might be:
never turn into a one-way road
When
A selfnoStabilization
specificationcontroller
exists the
will
car
recover
can continue
from this
driving on this
non-legal
road and
init crash
(by turning
with other
the car)
cars
Chapter 3 - Motivating Self-Stabilization
3-24
Processors Can Never be Sure that a
Safe Configuration is Reached
What use is an algorithm in which the processors are
never sure about the current global state?
 The question confuses the assumptions (transient
faults occurrence) with the algorithm that is
designed to fit the severe assumptions.
A self-stabilizing algorithm can be designed to
start in a particular (safe) state
 A self-stabilizing algorithm is at least good as a
non-self-stabilizing one for the same task, and is
in fact much better !!!
Chapter 3 - Motivating Self-Stabilization
3-25