Building Oracles for Robust Algorithm Design *

Packet Efficient Implementation of the
Omega Failure Detector
Ω
Lyon, France
Quentin Bramas
Dianne Foreback
Mikhail Nesterenko
Sébastien Tixeuil
November 8, 2016
What is Omega?
Oracle
• Abstract construct that enables a solution to an otherwise unsolvable problem in a
particular distributed system model
• Encapsulates the portion of solution that is impossible to implement
Consensus requires non-crashed (correct) processes to agree on a value
• FLP [11]: Consensus is impossible to solve in asynchronous system model even a if
single process may crash: impossible to know if a process crashed or is merely slow
 requires an oracle
Omega - an oracle that eventually outputs the same leader at every correct process
• Weakest failure detector to solve consensus in the asynchronous system model [6]
• Encapsulates the least amount of synchrony for a solution to Consensus
2
Notation
• Message – contents to be distributed to other processes (using packets)
• Packet – portion of data transmitted across channel
• Fair-lossy channel - if an infinite number of messages are sent along this channel, an
infinite number of messages are received
implementation efficiency metrics
• Message Efficient Implementation: only a single process sends an infinite number of
messages and all but finitely many messages are of constant size
• Packet Efficient Implementation: if all but finitely many messages are transmitted using
O(n) packets.
 Packets of different messages may potentially use different channels
⇒ the number of used channels is not limited
•
Super Packet Efficient Implementation: requires only a O(n) channels to transmit
packets of an infinite number messages
⇒ the number of used channels is limited
3
Omega Implementations
Implementation conditions
• Omega is impossible to implement in the asynchronous system model
• Must strengthen the system model with added synchrony and reliability assumptions
• What is the least restrictive communication model to implement Omega?
Aguilera et al. [1] algorithm
• Requires at least one process to have eventually timely
channels to all correct processes
• Requires a (possibly different) single process to have fairlossy channels to/from all others
timely channel
drops all packets
• Message efficient
Delporte-Gallot et al. [9] algorithm determines timely channel graph
• Can be used to construct Omega
• Requires at least one process to have timely paths to all correct processes
• Requires all channels to be reliable
• Super packet efficient
If channel timeliness/reliability is arbitrary,
what are the necessary&sufficient conditions for Omega implementation?
4
Single-hop vs. Multi-hop
Theorems 1 and 2 (summarized): If timely channel probability is 𝑝 < 1 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙
then, as network size n grows, the probability of leader existence using
• Single-hop direct communication: approaches zero exponentially fast
• Multiple-hop communication: approaches one exponentially fast
Theorems 3 and 4 (summarized): The probability of a leader persisting while the
timeliness of channels change
• Single-hop tends to zero
• Multiple-hop tends to infinity
Practically speaking, a multi-hop omega implementation is far more likely to succeed in
establishing a persistent leader
v
u
w
x
t
z
y
5
Our Deterministic Results
• Prove necessary conditions for Omega
 At least one process (leader) needs to have a timely path
to all correct processes
 All processes have to have a fair-lossy path to the leader
• Present an algorithm MPO that implements Omega under these
conditions
⇒ the conditions are necessary and sufficient
• Prove cannot have super packet-efficient algorithm
⇒ Algorithm of Delporte-Gallot et al. channel reliability
requirement is necessary
6
Infinite Number of Messages
Theorem 5 and Corollary 1 (summarized): In an implementation of Omega, at least one
process needs to send infinitely many timely messages. This process must be the leader.
Intuition: The leader needs to periodically inform other
processes of its correctness or they will not be able to detect its
crash
x
Why?: Cannot distinguish between crashed and slow process
thus toggling between leaders may occur
y
w
z
v

Let process x be the leader

Let v, w and z crash

If x stops sending timely messages, y thinks x crashed and
elects itself as leader

Send messages arbitrarily and now x picks y as leader

Repeat as above but with y
7
No Channel Reliability/Timeliness
Lemma 1 and Corollary 2 (summarized): If no channel reliability/timeliness
properties, to timely deliver a message
• Each recipient must send it across every outgoing channel (except possibly the
channels to the leader and sender)
• Ω 𝑛2 packets are needed (not packet efficient)
Intuition: Cannot establish properties, have to send packets everywhere otherwise
might skip be the channel that contains the only timely path leading to some process
x
y
w
z
v
8
Fair-Lossy Path
Lemma 2: In any message efficient implementation of Omega, each correct process
must have a fair-lossy path to the leader
Intuition: must be able to provide feedback if the leader is no longer timely.
• Let process x be the leader
• If y no longer considers x its leader and gets no other messages, y elects itself leader
• y does not send fair-lossy messages so others do not know of y’s decision
• Thus, no agreement on leader
x
y
w
z
v
9
Eventually Timely Paths
Theorem 6: For a message and packet efficient implementation
• At least one process x must have an eventually timely path to all correct processes
• every correct process has a fair-lossy path to the leader
Intuition:
• The leader x must send an infinite number of messages (so others do not think it
crashed)
• If a proportional number of processes do not have timely incoming channels, then all
processes must forward each message
• ⇒ (per Lemma 2) number of packets required for each message is in Ω 𝑛2 -
x
contradiction
y
w
z
v
10
Our Deterministic Results
• Prove necessary conditions for Omega
 At least one process (leader) needs to have a timely path
to all correct processes
 All processes have to have a fair-lossy path to the leader
• Present an algorithm MPO that implements Omega under these
conditions
⇒ the conditions are necessary and sufficient
• Prove cannot have super packet-efficient algorithm
⇒ Algorithm of Delporte-Gallot channel reliability requirement
is necessary
11
No Super Packet Efficiency
Theorem 7: Even if there the leader has an eventually timely path to every correct
process and every correct process has a fair-lossy path to the leader, there does not
exist a message and super packet efficient implementation of Omega
S1
L1
S2
eventually
timely channel
x
y
leader
leader
w
z
timely
component
L2
eventually
reliable channel
timely
component
Intuition:
• Assume there are an equal number of processes in each component
• If there is no communication between processes of the two components initially,
processes in S1 assume those in S2 are crashed and vice versa
• Processes in S1 elect L1 from S1 as their leader and processes in S2 elect L1 from
S2 as their leader
• The eventuality of the timely and reliable channels occur after the leaders are elected
Thus, the conditions of the theorem exist. Yet, disagreement on the leader occurs
12
Our Deterministic Results
• Prove necessary conditions for Omega
 At least one process (leader) needs to have a timely path
to all correct processes
 All processes have to have a fair-lossy path to the leader
• Present an algorithm MPO that implements Omega under these
conditions
⇒ the conditions are necessary and sufficient
• Prove cannot have super packet-efficient algorithm
⇒ Algorithm of Delporte-Gallot channel reliability requirement
is necessary
13
MPO Algorithm: Features
• Leader candidate estimates reliability of channels by building
arboresence of timely channels
• Sends packets over these channels
• Gets failed messages if untimely
• Process with lowest weight arborescence wins
• Weight used to evaluate channel reliability
• Proceeds in phases to preserve efficiency
• Messages sent an infinite number of times are of constant size
x
y
w
alive(x, phase#, shoutID)
y
4
z
v
w
x
startPhase(x, phase#, arb)
w
X
z
v
14
x Takes Leadership
• x calculated weight of arbs from all processes and it has the lowest arb
• x broadcasts startPhase sending the arb to limit the channels being
used
x
startPhase(x, phase#, arb)
y
w
z
v
15
Leader x Sends alive
Goal: alive timely received by all correct processes according to arborescence per phase#,
otherwise, x keeps track of failed to build new arb should it lose leadership
w
x
x
alive(x, phase#, shoutID)
y
4
w
X
z
y
w
5
v
failed(x,y,z) z
v
Timely received via the arborescence:
Not timely received by z
• w, y timely receive x’s alive from parent per
• z broadcasts failed, and increases
current phase# (ignore if earlier phase#)
 y forward alive according to arborescence
or shouts (broadcast)
 w‘s turn to shout alive
• v first receives from non-parent w, waits to
receive from parent y (or turns on timer if it was
off)
timer to gauge delay
• All other processes broadcast z’s
failed once
• x receives failed for the first time,
increases yz weight—used to
calculate arbs for next phase (should
it lose leadership)
16
x timeout Expires : Is x Still the Leader?
x calculates new arb considering updated edge weights of untimely channels
• x still a leader: If arb is smaller than other processes arbs
 Message efficient: uses old arb and never updates phase # (no need to send
a new arb to all)
• x loses leadership:
 Increases its phase# broadcasting stopPhase
 Message Efficient: stopPhase with newer phase# used by other processes to
turn off their timer for x and to quit sending failed
 x stops sending alive
 x keeps own timer on to check if regains leadership
•
•
•
•
Timeout expires
Calculate new arb
•
•
Kept leadership
Use old arb, send alive
x
x
Lost leadership
Broadcast stopPhase
x
stopPhase(x; phase#)
y
w
y
w
y
w
5
z
v
z
v
z
v
17
x Regains Leadership
x arb is once again smaller than others
• Other leader must have sent stopPhase due to non-timely channels
x startPhase(x; phases[x]; arbs[x])
y
w
z
x
alive(x, phase#, shoutID)
y
v
w
z
v
• x timeout expires (process’ own timer is always on)
• x kept collecting failed messages updating its edges—calculates its new arb
• x broadcasts startPhase with new arb to all other processes
 So other processes will await alive messages and send failed if not timely
received from parent according to new arb
18
MPO Algorithm Message and Packet Efficient
Message Efficient: Only a single process x sends an infinite number of messages and
all but finitely many messages are of constant size
x
w
alive(x, phase#, shoutID)
y
w
Non-constant message,
startPhase, sent finite times
v
z
Packet Efficient: All but finitely many messages are transmitted using O(n) packets
 Packets of different messages may potentially use different channels
 Thus, the number of used channels is not limited
w
x
alive(x, phase#, shoutID)
Eventually, only 2 𝑛 − 1 packets sent
y
w
Broadcast precludes super packet
efficiency of O(n) channels
z
v
19
Packet Efficient Implementation
of the Omega Failure Detector
Thank you!
Ω
Lyon, France
Questions?
November 8, 2016