Lecture 13

When Is Agreement Possible?
CS 188
Distributed Systems
February 24, 2015
CS 188,Winter 2015
Lecture 13
Page 1
Introduction
• Basics of agreement protocols
• Impossibility of agreement in
asynchronous system with failures
• When is agreement possible?
CS 188,Winter 2015
Lecture 13
Page 2
Basics of Agreement Protocols
• What is agreement?
• What are the necessary conditions for
agreement?
CS 188,Winter 2015
Lecture 13
Page 3
What Do We Mean By
Agreement?
• In simplest case, can n processors
agree that a variable takes on value 0
or 1?
– Only non-faulty processors need
agree
• More complex agreements can be built
from this simple agreement
CS 188,Winter 2015
Lecture 13
Page 4
Conditions for Agreement
Protocols
• Consistency
– All participants agree on same value
and decisions are final
• Validity
– Participants agree on a value at least
one of them wanted
• Termination
– All participants choose a value in a
finite number of steps
CS 188,Winter 2015
Lecture 13
Page 5
Impossibility of Agreement in
Async System With Failures
• Assume a reliable, but asynchronous,
message passing system
– Any message may face arbitrary
delays
• Can a set of processors reach
agreement if one of the processors
fails?
CS 188,Winter 2015
Lecture 13
Page 6
Agreement Isn’t Always Possible
• In the general case for arbitrary
systems
• Adding some special properties to the
system may change that result
• But without those properties, provably
impossible
– A result sometimes abbreviated FLP
• For Fischer, Lynch, and Patterson,
who proved it
CS 188,Winter 2015
Lecture 13
Page 7
Model of the System
• The system consists of n processors
• The goal is for all non-faulty
processors to agree on value 0 or 1
• Rule out the trivial case of always
agreeing on 0 (or 1)
• Agreement depends on protocol, initial
state, and inputs to each processor
CS 188,Winter 2015
Lecture 13
Page 8
Bivalent and Univalent States
• A bivalent state is a system state that
could lead to either value being
decided
• A univalent state can only lead to one
of the values being decided
– 0-valent or 1-valent
• Valency must take allowable failures
into account!
CS 188,Winter 2015
Lecture 13
Page 9
System Configuration
• Processors have internal state
• State of network is the set of messages
sent, but not yet received
• Event e is the receipt of message m by
a processor
– Which can lead to sending one or
more new messages
– Events are deterministic
• A schedule is a sequence of events
CS 188,Winter 2015
Lecture 13
Page 10
Proving the Result
• Let’s assume the result is false
– That we can reach agreement with
one failure in these conditions
• Use an adversarial model
– Within rules of behavior, assume
adversary can force any legal event
• Look for contradictions
CS 188,Winter 2015
Lecture 13
Page 11
What Can the Adversary Do?
• Force any processor to perform an
event at any moment
• Choose any message to be delivered to
any processor when it requests a
message
• Delay any message arbitrarily long
• Once, it can kill one processor
permanently
CS 188,Winter 2015
Lecture 13
Page 12
The Necessity of Bivalency
• There has to be an initial bivalent
configuration for the system
• Why?
• If all processors started with value 1,
the system would decide 1
• If all processors started with value 0,
the system would decide 0
CS 188,Winter 2015
Lecture 13
Page 13
Intermediate Initial States
• If some processors start with value 0
and some with value 1
– Some initial states lead to result 1
– Some initial states lead to result 0
– All initial states lead to one or the
other
• So there is a 1-valent initial state that
differs from a 0-valent initial state by
one processor’s initial value
CS 188,Winter 2015
Lecture 13
Page 14
A Graphical Representation
What’s in
these
State x states?
Node 1:0
Node 2:1
Node 3: 1
.
.
.
Node N: 0
State y
Node 1:0
Node 2:1
Node 3: 1
.
.
.
Node N: 1
They differ
in only one
value
1-valent initial states
0-valent initial states
CS 188,Winter 2015
Lecture 13
Page 15
Why Does This Imply
Bivalence?
• What if that one differing processor is
the processor that fails?
• The system must still reach agreement
from the remaining states
– Which are identical, now
• But on what value?
CS 188,Winter 2015
Lecture 13
Page 16
Is This Possible?
Looks
Doeslike
the x
system
and y must
decide
be
bivalent
on 1?
0?
State x
State y
Node 1:0
Node 2:1
Node 3: 1
.
.
.
Node N: 0
Node 1:0
Node 2:1
Node 3: 1
.
.
.
Node N: 1
Then State yx
wasn’t 0-valent,
1-valent,
after all
0-valent initial states
CS 188,Winter 2015
1-valent initial states
Lecture 13
Page 17
So What?
• So there has to be at least one bivalent
initial state
• Why’s that so bad?
• If the system never leaves a bivalent
state, it never makes a decision
• We must show our adversary can’t
perpetually force bivalency
CS 188,Winter 2015
Lecture 13
Page 18
The Persistence of Bivalency
• Let’s assume bivalency doesn’t persist
• At some point, some bivalent state
must transition to a univalent state
– Implying at least two events
• One to go to 0-valent
• One to go to 1-valent
• With no events leading to bivalent
states
CS 188,Winter 2015
Lecture 13
Page 19
A Graphical Representation
C
e
D
e’
D’
Remember, these events are each delivery of
a message
So m and m’ must have been in the message
delivery system state simultaneously
CS 188,Winter 2015
Lecture 13
Page 20
Looking Closely at Events
e and e’
• What would happen if we executed e
first, then e’?
• What would happen if we executed
them in the opposite order?
• Well, why should I care?
• Would executing them in either order
lead to the same state?
• If so, there’s a contradiction
CS 188,Winter 2015
Lecture 13
Page 21
Order of Events e and e’
C
e
e’
D’
D
e’
CS 188,Winter 2015
e
Lecture 13
Page 22
Why Should They Lead to the
Same State?
• What if e and e’ occur on different
processors?
• Then they’re independent events
• So they should produce the same result
if executed in either order
• So e and e’ could not have occurred on
different processors
CS 188,Winter 2015
Lecture 13
Page 23
Could the Events Occur on the
Same Processor P?
• If e was first, the state became 0-valent
• If e’ was first, the state became 1valent
• But what if P then fails?
• Since the event happened only at P,
only P sees the effects
• So we’re still in a bivalent state
CS 188,Winter 2015
Lecture 13
Page 24
Recapitulating the Argument
• It’s possible to start in a bivalent state
• There must be some point at some
processor P at which the bivalent state
changes to univalent
• If P fails before anyone knows the
valency, the system becomes bivalent
– And can never settle to univalency
• Perpetual bivalency implies no
agreement
CS 188,Winter 2015
Lecture 13
Page 25
When Is Agreement Possible?
• Didn’t we show in the last class that we
can reach agreement if less than 1/3 of
our processors are faulty?
• Yes, but only if the message passing
system is synchronous
• Whether agreement is possible in a
system depends on certain parameters
CS 188,Winter 2015
Lecture 13
Page 26
Parameters for Agreement In
Distributed Systems
• Synchronous vs. asynchronous
processors
• Bounded vs. unbounded
communications delay
• Ordered vs. unordered messages
• Point-to-point vs. broadcast
communications
CS 188,Winter 2015
Lecture 13
Page 27
Synchronous vs. Asynchronous
Processors
• Synchronous processors imply that all
processors make progress predictably
• More precisely, there is a constant s
such that
– for every s+1 steps taken by Pi
– all Pj will take at least one step
CS 188,Winter 2015
Lecture 13
Page 28
Bounded vs. Unbounded
Communications Delay
• Delay is bounded if and only if all
messages arrive at their destination
within t steps
– Implies no lost messages
• Doesn’t imply messages arrive in the
order sent
CS 188,Winter 2015
Lecture 13
Page 29
Ordered vs. Unordered Messages
• Messages are ordered if they are
received in the same real time order as
their sending
– Using true real time
• In some cases, merely receiving all
messages in same order at all
processors is enough
CS 188,Winter 2015
Lecture 13
Page 30
Point-to-Point vs. Broadcast
Communications
• Point-to-point communications means
a given message sent by Pi is seen only
by its destination Pj
• Broadcast communications mean that
Pi can send a message to all other
processors in a single atomic step
• Most typically by hardware broadcast
CS 188,Winter 2015
Lecture 13
Page 31
So, When Can We Reach
Agreement?
• Case 1: Processors are synchronous
and communications is bounded
• Case 2: Messages are ordered and the
transmission medium is broadcast
• Case 3: Processors are synchronous
and messages are ordered
• And that’s it
– (Case 1 covers Byzantine agreement)
CS 188,Winter 2015
Lecture 13
Page 32
What Does This Result Mean?
• For practical systems we really build
• Not that we can never reach agreement
– Good systems almost always do
• But that we generally can’t guarantee it
• Which implies that our systems should
tolerate disagreements
– At some times
– Under some conditions
CS 188,Winter 2015
Lecture 13
Page 33
When Is Disagreement OK?
• For preference, when it doesn’t matter
– E.g., when reasonable results
possible even without agreement
• Or when it eventually works itself out
– With possible inconsistencies in the
meantime
• Or, at worst, when it is visible to
people who can fix it
CS 188,Winter 2015
Lecture 13
Page 34
When Is Disagreement Not OK?
• When the consequences of
disagreement are dire
• When it results in unfixable problems
• When its consequences are invisible,
but relevant
• Unfortunately, we don’t always get to
choose when we can avoid it
CS 188,Winter 2015
Lecture 13
Page 35
Minimizing Chances of
Disagreement
• Understand when agreement is most
critical
• In those cases, use protocols that are
less likely to fail on agreement
– Which usually have heavy expenses
– So don’t always use them
CS 188,Winter 2015
Lecture 13
Page 36
A Classification of Faults
• More detailed than previously
discussed
• Produced by fault-tolerant computing
community
• Divides faults into classes
– Stronger class is subset of weaker
class
CS 188,Winter 2015
Lecture 13
Page 37
An Ordered Fault Classification
Byzantine
Authenticated Byzantine
Incorrect Computation
Timing
Omission
Crash
Fail Stop
CS 188,Winter 2015
Lecture 13
Page 38
Fail Stop Faults
• A processor ceases operation
• But informs other processors in
computation that it has stopped
• Relatively easy to deal with
CS 188,Winter 2015
Lecture 13
Page 39
Crash Fault
• A processor crashes or loses internal
state and halts
• Without notification to anyone else
• Hard to distinguish from a really slow
processor
CS 188,Winter 2015
Lecture 13
Page 40
Omission Faults
• A processor fails to do something in
time
– Like respond to a message
• But otherwise it may still be operating
correctly
– Or it may have crashed
CS 188,Winter 2015
Lecture 13
Page 41
Timing Fault
• A processor completes a task before or
after the window when it should
– Or never
• A late acknowledgement to a message,
e.g.
CS 188,Winter 2015
Lecture 13
Page 42
Incorrect Computation Fault
• A processor fails to produce the correct
results for a given set of input
• Which could be merely not producing
the results soon enough
• Or could be sending back trash
CS 188,Winter 2015
Lecture 13
Page 43
Authenticated Byzantine Fault
• Processor performs an arbitrary or
malicious fault
• But authentication mechanisms note
any alterations made to others’
messages
CS 188,Winter 2015
Lecture 13
Page 44
Byzantine Fault
• Any and every fault
• Having arbitrarily bad consequences
• Possibly working in combination with
other faults to produce really bad
results
• In this classification, all other faults are
subclasses of Byzantine faults
CS 188,Winter 2015
Lecture 13
Page 45