Phase 1

Revisiting failure detectors
Some of you asked questions about implementing
consensus using S - how does it differ from
reaching consensus using P. Here it is.
Recall the definition of S (strong) FD:
Strong completeness + weak accuracy
Consensus using S
{Program for process p}
Vp := (,, .. ); Vp[p] := input of p; Dp := Vp
(Phase 1) Same as phase 1 of consensus with P
(Phase 2)
send (Vp, p) to all;
receive (Dq, q) from all q, or q is a suspect;
k :=1;
do k ≠ n 
if Vq[k]: Vp[p] ≠   Vq[k] =   Vp[k] := Dp[k] := fi
od
(Phase 3)
Decide on the first element Vp [j]: Vp [j] ≠ 
Example
0 1 2 3 4
0 1 2 3 4
Never
0
suspected
{1, 4}
--
---
1
{2, 4}
 --
---
2
{4}
-
---
3
{2, 4}
--
---
crashed
4
List of suspects
V after
Phase 1
V after
Phase 2
Atomic Commit Protocols
S1
Servers may crash
Network of servers
S2
S3
The initiator of a transaction is called the coordinator,
and the remianing servers are participants
Requirements of Atomic
Commit Protocols
Network of servers
S2
S1
Servers may crash
S3
Termination. All non-faulty servers must eventually reach an
irrevocable decision.
Agreement. If any server decides to commit, then every server must
have voted to commit.
Validity. If all servers vote commit and there is no failure, then all
servers must commit.
One-phase Commit
client
server
server
participant
server
participant
server
participant
coordinator
If a participant deadlocks or faces a problem then the
coordinator may never be able to find it. Too simplistic.
Two-phase commit (2PC)
Phase 1: The coordinator sends VOTE to the participants. and receive
yes / no from them.
Phase 2:
if server j: vote(j) = yes  multicast COMMIT to all severs
  server j : vote (j) = no  multicast ABORT to all servers
fi
What if failures occur?
Failure scenarios in 2PC
(Phase 1)
Fault:
Solution:
Coordinator did not receive YES / NO:
OR
Participant did not receive VOTE:
Broadcast ABORT;
Abort local transactions
Failure scenarios in 2PC
(Phase 2)
(Fault) A participant does not receive a COMMIT or ABORT
message from the coordinator
(it may be the case that the coordinator crashed after sending
ABORT or COMIT to a fraction of the servers), then it remains
undecided, until the coordinator is repaired and reinstalled into the
system.
This blocking is a known weakness of 2PC.
Coping with blocking in 2PC
A non-faulty participant can ask other participants about
what message (COMMIT or ABORT) did they receive from
the coordinator, and take appropriate actions.
But what if no non-faulty participant received anything?
Who knows if the coordinator committed or aborted the
local transaction before crashing? Continue to wait …
Non-blocking Atomic Commit
A blocking protocol has the potential to prevent non-faulty
participants from reaching a final decision.
A solution to the atomic commitment problem is called nonblocking, if in spite of server crashes, every non-faulty
participant eventually decides.
One solution is to impose the requirement of uniform
agreement
Uniform agreement
If any participant (faulty or not) delivers a message m
(commit or abort) then all correct processes eventually
deliver m.
To implement uniform agreement, no server should deliver a
COMMIT or ABORT message until it has relayed it to all other
servers.
If a process times out in phase 2, then it decides abort.
Recovery: Stable storage
Creates the illusion of an incorruptible storage, even if
a writer or a disk crashes at any time. The implementation
Uses at least two independent disks.
A0
A0
Q
P
update
A1
A1
inspect
Stable storage
To write, do the following:
copy on disk A0;
record timestamp T0;
compute checksum S0;
copy on disk A1;
record timestamp T1;
compute checksum S1
Readers check four cases:
Both checksums OK and T1>T0
Both checksums OK and T1<T0
Checksum on A1 wrong
Checksum on A2 wrong
(Which copy to accept in each case?)
A0
A0
Q
P
update
A1
A1
inspect
Checkpointing

Mechanism for (backward) error
recovery. Transaction states are
periodically stored on stable
storages. Following a failure, the
transaction rolls back to the
nearest checkpoint.

Independent (unsynchronized) or
coordinated (synchronized)
checkpointing
Classification of checkpointing
Coordinated Checkpointing takes a consistent snapshot.
Has some overhead.
Uncoordinated checkpointing apparently has no overhead.
But it may have some efficiency problems.
p0
p1
p2
P
q0
q1
q2
Q
r0
r1
r2
R
Checkpointing (continued)
Some actions can be reversed, but some cannot be
reversed (like dispensing cash from an ATM machine,
printing a document etc).
Such actions are logged, and during replay, logs
substitute real actions.
Group Communication
Group oriented activities are steadily increasing.
There are many types of groups:
 Open and Closed groups
 Peer-to-peer and hierarchical groups
Major issues
Atomic multicast
 Ordered multicast
 Dynamic groups
 Failure handling

Atomic multicast

A multicast is called atomic, when the message is
delivered to every correct (i.e. functioning) member, or to
no member at all.

Sometimes, certain features available in the infrastructure
of a distributed system simplify the implementation of
multicast. Examples are (1) multicast on an ethernet LAN
(2) IP multicast
Basic vs. reliable multicast
Basic multicast does not consider crash failures.
Reliable multicast does.
Three criteria for basic multicast:
Liveness.
Each process must receive every message
Integrity.
No spurious message received
No duplicate. Accepts exactly one copy of a message
Reliable atomic multicast
Sender’s program
Receiver’s program
i:=0;
do i ≠ n 
send message to i;
i:= i+1
od
if m is new 
accept it;
multicast m;
 m is duplicate  discard m
fi
Tolerates process crashes.