Zyzzyva

ZYZZYVA:
SPECULATIVE BYZANTINE
FAULT TOLERANCE
R.Kotla, L. Alvisi, M. Dahlin,
A. Clement and E. Wong
U. T. Austin
Best Paper Award at SOSP 2007
1
Motivation
• Why implement Byzantine Fault-Tolerant
replication?
– Increasing value of data and decreasing cost
of hardware
– More non-stop-fail behaviors than believed
– BFT is becoming cheaper
– Cost of 3-way non-BFT replication close to
cost of BFT replication
2
Zyzzyva (I)
• Uses speculation to reduce the cost of BFT
replication
– Primary replica proposes order of client
requests to all secondary replicas (standard)
– Secondary replicas speculatively execute the
request without going through an agreement
protocol to validate that order (new idea)
3
Zyzzyva (II)
• As a result
– States of correct replicas may diverge
– Replicas may send diverging replies to client
• Zyzzyva’s solution
– Clients detect inconsistencies
– Help convergence of correct replicas to a
single total ordering of requests
– Reject inconsistent replies
4
How?
• Clients observe a replicated state machine
• Replies contain enough information to let clients
ascertain if the replies and the history are stable
and guaranteed to be eventually committed
• Replicas have checkpoints
5
Byzantine agreement (I)
• No solution for less than four entities
6
Byzantine agreement (II)
• To achieve agreement in the presence of f failed
nodes (“traitors”) we need
– 3f + 1 entities
7
Practical BFT (I)
• Practical Byzantine Fault-Tolerant protocol
(PBFT) [Castro and Liskov 1999]
8
Practical BFT (II)
Replicas decide on correct ordering
9
Practical BFT (III)
1. Client sends signed request to primary replica
2. Primary assigns a sequence number to the request
and sends to all other replicas a
PRE-PREPARE message
3. Secondary replicas validate the message and send
a PREPARE message to all replicas
4. Replicas that can collect 2f PREPARE messages
send a COMMIT message to all replicas
5. Replicas that can collect 2f+ 1 COMMIT message
send a REPLY to the client
10
A shortened version
Faster agreement is achieved thanks to
a more complex view change protocol
11
The explanation (I)
• "No replicated service that uses the traditional
view change protocol can be live without an
agreement protocol that includes both the
prepare and commit full exchanges"
• "The traditional view change protocol lets correct
replicas commit to a view change and become
silent in a view without any guarantee that their
action will lead to the view change."
12
The explanation (II)
• Zyzzyva
– Adds an extra phase to its view change
protocol
– Guarantees that a correct replica will not
abandon a view unless every other correct
replica does it
13
Zyzzyva Agreement (I)
• Common case: no faulty replicas
14
Explanations
• Secondary replicas assume that
– Primary replica gave the right ordering
– All secondary replicas will participate in
transaction
• Initiate speculative execution
• Client receives 3f + 1 mutually consistent
responses
15
Zyzzyva Agreement (II)
• With a faulty replica
16
Explanations (I)
• Client receives 3f mutually consistent responses
• Gathers at least 2f + 1 mutually consistent
responses
• Distributes a commit certificate to the replicas
• Once at least 2f + 1 replicas acknowledge
receiving a commit certificate, the client
considers the request completed
17
Explanations (II)
• If enough secondary replicas suspect that the
primary replica is faulty, a view change is
initiated and a new primary elected
18
Comparison with traditional solutions
19
State maintained at each replica
20
Explanations (I)
• Each replica maintains
– A history of the requests it has executed
– A copy of the max commit certificate it has
received
• Let it distinguish between committed
history and speculative history
21
Explanations (II)
• Each replica constructs a checkpoint every
CP_INTERVAL requests
• It maintains one stable checkpoint with a
corresponding stable application state
snapshot
• It might also have up to one speculative
checkpoint with its corresponding speculative
application state snapshot
22
Explanations (III)
• Checkpoints and application state become
committed through a process similar to that of
earlier BFT agreement protocols
– Replicas send signed checkpoint messages
to all replicas when they generate a tentative
checkpoint
– Commit checkpoint after they collect f + 1
signed matching checkpoint messages
23
View change sub-protocol (I)
24
Explanations
• Two-phase protocol
• Elects a new primary
• Guarantees that it will not introduce any changes
in a history that has already completed at a
correct client
25
Performance: throughput
26
Comments
• Zyzzyva-5 is a special version of Zyzziva
requiring more replicas but having a lower
overhead
27
Performance: latency
28
Scalability: peak throughputs
29
CONCLUSIONS
• Systematically exploiting speculative execution
results in a protocol much faster than
conventional BFT agreement protocols.
Observe that Zyzzyva is optimized for the most
frequent case but provides the correct result in
all cases
• A good rule to follow
30