Concurrent Reading and Writing using Mobile Agents

Hwajung Lee
Reaching agreement is a fundamental problem in distributed
computing. Some examples are
Leader election / Mutual Exclusion
Commit or Abort in distributed transactions
Reaching agreement about which process has failed
Clock phase synchronization
Air traffic control system: all aircrafts must have the same view
If there is no failure, then reaching consensus is trivial. All-to-all broadcast
Followed by a applying a choice function … Consensus in presence of
failures can however be complex.
input
output
p0
u0
v
p1
v
p2
u1
u2
p3
u3
v
v
Here, v must be equal to the value at some input line.
Also, all outputs must be identical.
Termination. Every non-faulty process must eventually decide.
Agreement. The final decision of every non-faulty process
must be identical.
Validity.
If every non-faulty process begins with the same
initial value v, then their final decision must be v.
Seven members of a busy household decided to hire a cook, since they do not
have time to prepare their own food. Each member separately interviewed
every applicant for the cook’s position. Depending on how it went, each
member voted "yes" (means “hire”) or "no" (means “don't hire”).
These members will now have to communicate with one another to reach a
uniform final decision about whether the applicant will be hired. The process
will be repeated with the next applicant, until someone is hired.
Consider various modes of communication…
Theorem.
In a purely asynchronous distributed system,
the consensus problem is impossible to solve
if even a single process crashes
Famous result due to Fischer, Lynch, Patterson
(commonly known as FLP 85)
Bivalent and Univalent states
A decision state is bivalent, if starting from that state, there exist
two distinct executions leading to two distinct decision values 0 or
1.
Otherwise it is univalent.
A univalent state may be either 0-valent or 1-valent.
Lemma.
No execution can lead from a 0-valent to a 1-valent
state or vice versa.
Proof.
Follows from the definition of 0-valent and 1-valent states.
Lemma. Every consensus protocol must have a bivalent initial state.
Proof by contradiction. Suppose not. Then consider the following input patterns:
s[0]
0 0 0 0 0 0 …0 0 0 {0-valent)
0 0 0 0 0 0 …0 0 1
0 0 0 0 0 0 …0 1 1
…
…
…
…
s[n-1] 1 1 1 1 1 1 …1 1 1 {1-valent}
s[j] is 0-valent
s[j+1] is 1-valent
(differ in jth position)
What if process (j+1) crashes at the first step?
Lemma.
In a consensus protocol,
starting from any initial
bivalent state, there must
exist a reachable bivalent
state T, such that every
action taken by some
process p in state T leads to
either a 0-valent or a 1-valent
state.
The adversary tries to prevent
The system from reaching
consensus
Q
bivalent
bivalent
bivalent
S
R
action 0
R0
o-valent
bivalent
U
action 1
R1
1-valent
bivalent
T
action 0
T0
o-valent
action 1
T1
1-valent
Actions 0 and 1 from T must be
taken by the same process p. Why?
Lemma.
In
a
consensus
Q
protocol,
starting from any initial bivalent
bivalent
state I, there must exist a
S
bivalent
bivalent
R
reachable bivalent state T,
action 0
such that every action taken by
R0
some process p in state T leads
o-valent
bivalent
U
action 1
R1
1-valent
bivalent
T
action 0
T0
o-valent
action 1
T1
1-valent
to either a 0-valent or a 1-valent
state.
Actions 0 and 1 from T must be
taken by the same process p. Why?
Assume shared memory communication.
Also assume that p ≠ q. Various cases are possible
1-valent
Case 1.
q writes
e1
T1
Decision =1
T0
Decision = 0
T
p reads
0-valent
e0
Such a computation must exist
since p can crash at any time
• Starting from T, let e1 be a computation that excludes any step by p.
• Let p crash after reading.Then e1 is a valid computation from T0 too.
To all non-faulty processes, these two computations are identical, but the
outcomes are different! This is not possible!
Case 2.
q writes
1-valent
T1
e1
Decision =1
T
p writes
T0
0-valent
Decision = 0
e0
Both write on the same variable, and p
writes first.
• From T, let e1 be a computation that excludes any step by p.
• Let p crash after writing.Then e1 is a valid computation from T0 too.
To all non-faulty processes, these two computations are identical,
but the outcomes are different!
Case 3
1-valent
q writes
T1
Decision =1
p writes
Z
T
p writes
q writes
T0
0-valent
Let both p and q write, but
Decision = 0
on different variables.
Then regardless of the order of these writes, both computations lead
to the same intermediate global state Z, which must be univalent.
Is Z 1-valent or 0-valent? Both are absurd
Similar arguments can be made for communication using
the message passing model too (See Lynch’s book). These
lead to the fact that p, q cannot be distinct processes, and
p = q. Call p the decider process.
What if p crashes in state T? No consensus is reached!

In a purely asynchronous system, there is no solution
to the consensus problem if a single process crashes..

Note that this is true for deterministic
algorithms only. Solutions do exist for the
consensus problem using randomized algorithm,
or using the synchronous model.
Describes and solves the consensus problem
on the synchronous model of communication.
Processor speeds have lower bounds and
communication delays have upper bounds.
- The network is completely connected
- Processes undergo byzantine failures, the worst
possible kind of failure
-

n generals {0, 1, 2, ..., n-1} decide about whether to "attack" or
to "retreat" during a particular phase of a war. The goal is to
agree upon the same plan of action.

Some generals may be "traitors" and therefore send either no
input, or send conflicting inputs to prevent the "loyal"
generals from reaching an agreement.

Devise a strategy, by which every loyal general eventually
agrees upon the same plan, regardless of the action of the
traitors.
Attack=1
{1, 1, 0, 0} 0
Attack = 1
1 {1, 1, 0, 1}
traitor
{1, 1, 0, 0} 2
Retreat = 0
The traitor
may send out
conflicting inputs
{1, 1, 0, 0}
3
Retreat = 0
Every general will broadcast his judgment to everyone else.
These are inputs to the consensus protocol.
We need to devise a protocol so that every peer
(call it a lieutenant) receives the same value from
any given general (call it a commander). Clearly,
the lieutenants will have to use secondary information.
Note that the roles of the commander and the
lieutenants will rotate among the generals.
commander
IC1. Every loyal lieutenant receives
the same order from the
commander.
IC2. If the commander is loyal, then
every loyal lieutenant receives
the order that the commander
sends.
lieutenants
Oral Messages
1. Messages are not corrupted in transit.
2. Messages can be lost, but the absence of message
can be detected.
3. When a message is received (or its absence is
detected), the receiver knows the identity of the
sender (or the defaulter).
OM(m) represents an interactive consistency protocol
in presence of at most m traitors.
Using oral messages, no solution to the Byzantine
Generals problem exists with three or fewer
generals and one traitor. Consider the two cases:
commander 0
1
1
commander 0
0
0
lieutenent 1
lieutenant 2
(a)
0
1
lieutenent 1
1
(b)
lieutenant 2
Using oral messages, no solution to the Byzantine Generals
problem exists with 3m or fewer generals and m traitors (m >
0).
Hint. Divide the 3m generals into three groups of m generals each, such that
all the traitors belong to one group. This scenario is no better than the case of
three generals and one traitor.
Recursive algorithm
OM(m)
OM(m-1)
OM(m-2)
OM(0)
OM(0) = Direct broadcast
OM(0)
1. Commander i sends out a value v (0 or 1)
2. If m > 0, then every lieutenant j ≠ i, after
receiving v, acts as a commander and
initiates OM(m-1) with everyone except i .
3. Every lieutenant, collects (n-1) values:
(n-2) values sent by the lieutenants using
OM(m-1), and one direct value from the
commander. Then he picks the majority of
these values as the order from i
commander
commander
0
0
1
1
1
1
22
1
3
1
1
1
1
2
3
3
1
(a)
1
0
0
1
2
1
0
22
3
1
1
0
0
2
3
3
1
(b)
1
1
1
2
Byzantine agreement algorithm for the
Oral Message model of communication
with at most m traitors
Recall what the oral message model is.
Recall the two interactive consistency
criteria IC1 & IC2.
Recursive algorithm
OM(m)
OM(m-1)
OM(m-2)
OM(0)
OM(0) = Direct broadcast
m= max number
of traitors
[1] Commander i sends out a value v (0 or 1)
[2] If m > 0, then every lieutenant j ≠ i, after
receiving v, acts as a commander and
initiates OM(m-1) with everyone except i .
[3] Every lieutenant, collects (n-1) values:
(n-2) values sent by the lieutenants using
OM(m-1), and one direct value from the
commander. Then he picks the majority of
these values as the order from i
commander
commander
0
0
1
1
1
1
22
1
3
1
1
1
1
2
3
3
1
(a)
1
0
0
1
2
1
0
22
3
1
1
0
0
2
3
3
1
(b)
1
1
1
2
Commander
OM(2)
0
v
v
1
2
v
OM(2)
v
v
3
v
4
6
5
OM(1)
v
4
5
v
6
v
OM(0)
v
5
v
5
2
v
6
v
v
2
v
v
6
v
v
6
2
v
2
5
v
4
v
v
6
v
2
OM(1)
v
4
OM(0)
5
Lemma.
Let the commander be
loyal, and n > 2m + k,
where m = maximum
number of traitors.
loyal commander
values received via OM(r)
Then OM(k) satisfies IC2
n-m-1 loyal lieutenants
m traitors
Proof
If k=0, then the result trivially holds.
Let it hold for k = r (r > 0) i.e. OM(r)
satisfies IC2. We have to show that
it holds for k = r + 1 too.
By definition n > 2m+ r+1, so n-1 > 2m+ r
So OM(r) holds for the lieutenants in
the bottom row. Each loyal lieutenant
will
collect n-m-1 identical good values and
m bad values. So bad values are voted
out (n-m-1 > m + r implies n-m-1 > m)
loyal commander
values received via OM(r)
n-m-1 loyal lieutenants
m traitors
Theorem. If n > 3m where m is the maximum number of
traitors, then OM(m) satisfies both IC1 and IC2.
Proof. Consider two cases:
Case 1. Commander is loyal. The theorem follows from
the previous lemma (substitute k = m).
Case 2. Commander is a traitor. We prove it by induction.
Base case. m=0 trivial.
(Induction hypothesis) Let the theorem hold for m = r.
We have to show that it holds for m = r+1 too.
There are n > 3(r + 1) generals and r + 1 traitors. Excluding
the commander, there are > 3r+2 generals of which there
are r traitors. So > 2r+2 lieutenants are loyal. Since 3r+ 2 >
3.r, OM(r) satisfies IC1 and IC2
> 2r+2
r traitors
In OM(r+1), a loyal lieutenant chooses the
majority from (1) > 2r+1 values obtained
from the loyal lieutenants via OM(r),
(2) the r values from the traitors, and
(3) the value directly from the commander.
> 2r+2
r traitors
The set of values collected in part (1) & (3) are the same for all loyal lieutenants –
it is the same set of values that these lieutenants received from the commander.
Also, by the induction hypothesis, in part (2) each loyal lieutenant receives
identical values from each traitor. So every loyal lieutenant collects the same set of values.
A signed message satisfies all the conditions of oral
message, plus two extra conditions


Signature cannot be forged. Forged message are
detected and discarded.
Anyone can verify its authenticity of a signature.
Signed messages improve resilience.
commander 0
1{0}
1{0}
commander 0
0{0,2}
0{0,2}
lieutenent 1
lieutenant 2
(a)
discard
0{0}
1{0}
lieutenent 1
1{0,1}
lieutenant 2
(b)
Using signed messages, byzantine
consensus is feasible with 3 generals
and 1 traitor
V{0}
1
V{0,1}
0
2
7
V{0,1,7}
V{0,1,7,4}
4
1. Commander i sends out a signed message v{i} to each lieutenant j ≠ i
2. Lieutenant j, after receiving v{S}, appends it to a set V.j, only if
(i) it is not forged, and (ii) it has not been received before.
3. If the length of S is less than m+1, then lieutenant j
(i) appends his own signature to S, and
(ii) sends out the signed message to every other lieutenant
whose signature does not appear in S.
4. Lieutenant j applies a choice function on V.j to make the final decision.
If n ≥ m + 2, where m is the maximum
number of traitors, then SM(m) satisfies
both IC1 and IC2.
Case 1.
Commander is loyal. The bag of
Each process will contain exactly one
message, that was sent by the commander.
Case 2. Commander is traitor.

The signature list has a size (m+1), and there are m traitors, so
at least one lieutenant signing the message must be loyal.

Every loyal lieutenant i will receive every other loyal lieutenant’s
message. So, every message accepted by j is also accepted by
i and vice versa. So V.i = V.j.

The signed message version tolerates a larger
number (n-2) of faults.

Message complexity however is the same in
both cases
Recall FLP’85 impossibility result. If crash failure
can be detected, then consensus is trivial!
In synchronous systems with bounded delay
channels, crash failures can definitely be detected
using timeouts.
But what about asynchronous systems? Can we
design a failure detector for purely asynchronous
distributed systems?
In asynchronous distributed systems, the detection of
crash failures is imperfect. This is why, consensus cannot
be solved. But how close can we get towards a perfect
failure detector? Two properties are relevant:
Completeness
Every crashed process is eventually suspected.
Accuracy
No correct process is suspected.
crashed
1
0
3
6
5
7
4
2
0 suspects {1,2,3,7} to have failed. Does this satisfy
completeness? Does this satisfy accuracy?
Strong completeness. Every crashed process is
eventually suspected by every correct process, and
remains a suspect thereafter.
Weak completeness. Every crashed process is
eventually suspected by at least one correct process,
and remains a suspect thereafter.
(These are liveness properties)
Strong accuracy. No correct process is ever
suspected.
Weak accuracy. There is at least one correct
process that is never suspected.
(These are safety properties)
Transforming Weak completeness into strong completeness
Program strong completeness (program for process i};
define D: set of process ids (representing the suspects);
initially D is generated by the weakly complete failure detector;
do true 
send D(i) to every process j ≠ i;
receive D(j) from every process j ≠ i;
D(i) := D(i)  D(j);
if j  D(i)  D(i) := D(i) \ j fi
od
Accuracy is a safety property. A failure detector is
eventually strongly accurate, if there exists a time T
after which no correct process is suspected.
(Before that time, a correct process be added to and
removed from the list of suspects any number of times)
A failure detector is eventually weakly accurate, if
there exists a time T after which at least one process is
no more suspected.
strong accuracy weak accuracy ◊ strong accuracy ◊ weak accuracy
strong completeness
weak completeness
Perfect P Strong S
Weak W
◊P
◊S
◊W
Perfect P. (Strongly) Complete and strongly accurate
Strong S. (Strongly) Complete and weakly accurate
Eventually perfect ◊P.
(Strongly) Complete and eventually strongly accurate
Eventually strong ◊S
(Strongly) Complete and eventually weakly accurate
Question 1. Given a failure detector of a certain type,
how can we solve the consensus problem?
Question 2. How can we implement these classes of
failure detectors in asynchronous distributed
systems?
Question 3. What is the weakest class of failure
detectors that can solve the consensus problem?
(Weakest class of failure detectors is
closer to reality)
input
1
2
3
4
output
Agreed value
{program for process p, t = max number of faulty processes}
initially Vp := (, , , …, ); Dp := Vp;
{Vp[q] ≠  means, process p thinks q is a suspect}
{Phase 1} for round rp= 1 to t +1
send (rp, Dp, p) to all;
wait to receive (rp, Dq, q) from all q, {or else q becomes a suspect};
for k = 1 to n Vp[k] =    (rp, Dq, q): Dq[k] ≠   Vp[k] := Dq[k] end for
end for
{at the end of Phase 1, Vp for each correct process is identical}
{Phase 2} Final decision value is the first element Vp[j]: Vp[j] ≠ 
Q. What is there is no suspect?
It is possible that a process p sends out the first message
to q and then crashes. If there are n processes and t of
them crashed, then after at most (t + 1) asynchronous
rounds, Vp for each correct process p becomes identical,
and contains all inputs from processes that may have
transmitted at least once. The choice function leads to
a unique decision.
Sends (2, Dj) and
then crashes
Sends (1, Di) and
then crashes
i
Sends (t, Dk) and
then crashes
ij
k
l
l
Sends (t+1,Dl)
Completely connected topology
l
l