EEC 682/782 - Academic Server

EEC 688/788
Secure and Dependable Computing
Lecture 12
Wenbing Zhao
Department of Electrical and Computer Engineering
Cleveland State University
[email protected]
Outline

Reminder

Time to work on project!

Project outline due: Nov 11th in class (hardcopy, no
extension!)


Topic, title, list of 5 papers
Distributed consensus and Paxos algorithms



7/13/2017
Multi-Paxos
Dynamic Paxos
Cheap Paxos
EEC688/788: Secure & Dependable
Computing
Wenbing Zhao
Multi-Paxos: Paxos for State Machine
Replication




Client: partially assumes the role of a proposer
Server replicas: acceptors
Value to be agreed on: total ordering of requests sent by
clients
Total ordering accomplished by running a sequence of
instances of Paxos


Each instance is assigned a sequence number, representing the
total ordering of the request that is chosen
Value to be chosen: the request chosen for the instance
Multi-Paxos: Paxos for State Machine
Replication

Client: partially assumes the role of a proposer



Primary: essentially the proposer in Paxos



Only propose a value (i.e., request it sends) without the
corresponding proposal number
A server replica, the primary, decides on the proposal number
Propose a sequence number – request binding
Propagate value chosen (i.e., total ordering info) to other replicas
(i.e., learners)
Initial membership is known with a sole primary


First phase can be omitted during normal operation
When the primary is suspected, a new primary is elected (view
change)
Multi-Paxos: Paxos for State Machine
Replication
Q
U
ES
T
Execution
P2
b
Replica 0
(Primary)
RE
PL
Client
Y
Normal operation of Multi-Paxos
M
a
M
P2
IT
Replica 2
(Backup)
b
2
P
CO
Replica 1
(Backup)
Accept Phase
Learning Phase
RE

Multi-Paxos: Checkpointing and
Garbage Collection

Paxos is open-ended: it never terminates




A proposer is allowed to initiate a new proposal even if every
acceptor has accepted a proposal
An acceptor must remember the last proposal that it has
accepted and the latest proposal number it has accepted
In Multi-Paxos, every replica must remember such info
for every instance of Paxos: Need infinite memory
Solution: periodic checkpointing, e.g., once for every n
requests


Garbage collect logs after taking each checkpoint
Request or control msg needed by a slow replica may not be
available anymore after a checkpoint => state transfer
Multi-Paxos: Leader Election and View Change



Leader election: can be done using a full Paxos instance
New primary needs to determine if a value has been
chosen in each incomplete instance of Paxos
Leader election and history determination can be done in
a simple full paxos: view change
View v
Replica 0
View Change
(prepare phase)
(primary for v)
Replica 1
(primary for v+1)
Replica 2
V
_
IEW
C
N
HA
New View
Installation
(accept phase)
GE
NE
W_
VI
EW
Multi-Paxos: View Change


A set of 2f+1 replicas, replica id: 0,1,…,2f
History of system: a sequence of views




Each view: one and only one primary
Initially replica 0 assumes the primary role for v=0
Subsequently, replicas take the primary role in a
round-robin fashion
To ensure liveness


A replica starts a view change timer on the initiation of
each instance of Paxos
If the replica does not learn the request chosen before
timer expires => suspect the primary
View Change




On suspecting the primary, a replica broadcasts
a view change message to all
Current primary, if it is wrongly suspected, joins
the view change anyway (i.e., it steps down from
primary role)
A replica joins the view change even if it’s view
change timer has not expired yet
On joining view change, a replica stops
accepting normal control msgs and respond to
only checkpointing and view change msgs
View change

View change msg contains



New view #
Seq# of last stable checkpoint
A set of accepted records since last stable checkpoint


On receiving f+1 view change msgs, new
primary sends new view msg

Include a set of accept msgs



Each record consists of view#, seq#, request msg
Include all accepted msgs as part of view change msg
When a gap in seq# is detected, create an accept request with
no op
A replica accepts new view msg if it has not
installed a newer view
Dynamic Paxos





Designed to accommodate reconfiguration
Extension majority concept to quorum
Classic Paxos => uses static quorum
Dynamic quorum: quorum size may change
dynamically
Cheap Paxos uses dynamic quorum
Dynamic Paxos



Fewer replicas are required by using spare
replicas and reconfiguration provided no other
fault during reconfiguration
Without reconfiguration, 3f+1 replicas can only
tolerate up to 3f/2 faulty replicas
2f+1 active replicas, plus f spares can tolerate
up to 3f-1 faulty replicas via substitution and
reconfiguration

As long as 1 active replica and 1 spare are operating
Dynamic Paxos



Reconfiguration request must be totally ordered
with respect to regular application requests
A reconf request includes both new membership
and quorum definition
Replicas in the new membership should not
accept msgs unrelated to reconf from replicas
that have been excluded from the membership


External replicas should not be allowed to participate
the consensus step
Replica mistakenly excluded can join via
recofiguration
Initial
Configuration
Spare Replicas
Active Replicas
R0
R1
R2
R3
R4
S0
S1
R3
R4
S0
S1
R3
S1
S0
R3
S1
S0
S0
S1
S0
S1
f=2
R 4 failed
R0
R1
R2
f=2
Reconfigured
( R 4 replaced by S 1 )
R0
R 3 failed
R0
R1
R2
f=2
R1
R2
f=2
Reconfigured
( R 3 replaced by S 0 )
R0
R 2 failed
R0
R1
R2
f=2
R1
R2
f=2
R 2 failed
R0
R1
R2
S0
S1
S0
S1
S0
S1
S0
S1
f=2
R 1 failed
(Can still form quorum)
R0
Reconfigured
(from f=2 to f=1)
R0
S 0 failed
(Can still form quorum,
but no longer tolerate
any additional fault)
R1
R2
f=2
R1
R2
f=1
R0
R1
R2
f=1
Total Number of Faults Tolerated with Reconfiguration: 5
(Total Number of Faults Tolerated without Reconfiguration: 3)
Cheap Paxos



Cheap Paxos is a pecial instance of Dynamic
Paxos
Aims to minimize involvement of spare replicas
Enable the use of f+1 active replicas to tolerate f
faults, provided that sufficient spares are
available (f or more)


Active replicas are referred to as main replicas
Spare replicas are referred to as auxiliary replicas
Cheap Paxos

Primary quorum


Secondary quorum



Consists of all active replicas
Must be formed by the majority of combined replicas
Consists of at least one main replica => Ensures
intersection between primary and secondary quorums
Question: what if only one active replica left?
Main Replicas
Primary quorum
formation
R0
R1
Auxiliary Replicas
R2
S0
S1
S0
S1
S0
S1
Primary Quorum
An example
secondary quorum
formation
R0
Another example
secondary quorum
formation
R0
R1
R2
Secondary Quorum
R1
R2
Secondary Quorum
Main Replicas
R 2 failed
R0
R1
Auxiliary Replicas
R2
S0
S1
S0
S1
Primary Quorum
Reconfigured
R0
R1
Primary Quorum
R 1 failed
R0
Secondary Quorum
R1
S0
S1
S0
S1
Primary Quorum
Reconfigured
R0
Secondary Quorum
Primary Quorum
Cheap Paxos

Upon detection of the failure of an active replica,
a reconfiguration request is issued




New primary quorum still consists of all surviving
active replicas
When reconfig request is executed, switch to
new configuration
Auxiliary replicas are not bothered unless a
reconfiguration is necessary
What if the primary fails => view change
Cheap Paxos: View Change



For history information: new primary must collect
info from every active replica except the old
primary
For approval of the primary role, the new
primary must collect votes from all surviving
active replicas plus one or more auxiliary replica
=> a secondary quorum
The secondary quorum is used to complete all
Paxos instances started by old primary but not
yet completed
Cheap Paxos

Replicas in secondary quorum must propagate their
knowledge to all replicas prior to moving back to primary
quorum


So that auxiliary replicas do not have to keep all requests and
control msgs
How to achieve this




Primary notifies its latest state to all replicas not in secondary
quorum
Main replica (if it is not in secondary quorum) requests for
retransmission of missing msgs
Auxiliary replica keeps new info from primary and purge old data,
and ack to the primary
Primary resumes ordering requests after it receives ack from every
replica
PL
b
Execution
RE
T
b
CO
MM
IT
Auxiliary
Replica
ES
a
Main
Replica 2
(Backup)
U
P2
Main
Replica 1
(Backup)
Q
P2
Main
Replica 0
(Primary)
RE
P2
Client
Y
Example
Accept Phase
Learning Phase
Example
Main
Main
Main
Replica 0 Replica 1 Replica 2
(Primary) (Backup) (Backup)
Client
RE
Q
U
ES
T
P2
Using
Primary
Quorum
(R0, R1, R2)
P2
a
b
Timed out
primary
quorum
Switched
to secondary
quorum
P2a
Auxiliary
Replica
P2
Example
b
Timed out
primary
quorum
Switched
to secondary
quorum
P2a
P2b
Using
Secondary
Quorum
(R0, R1, S)
COMM
Execution
P
RE
LY
P2
(reco a
nfig)
P2b g)
fi
n
o
c
(re
RE
Using
IT
QU
ES
T
P2
P2b g)
i
f
n
o
(rec
Example
RE
QU
Using
new
Primary
Quorum
(R0, R1)
ES
T
P2
a
b
2
P
CO
Execution
R
L
EP
Y
M
M
IT