PDCAT04

A Fault-Tolerant
h-out of-k Mutual Exclusion Algorithm
Using Cohorts Coteries for Distributed Systems
Presented by
Jehn-Ruey Jiang
National Central University
Taiwan, R. O. C.
Distributed Systems

A distributed system consists of interconnected,
autonomous nodes which communicate with each other
by passing messages.
Interconnected Network
Mutual Exclusion
 A node
in the system may need to enter the
critical section (CS) occasionally to access a
shared resource, such as a shared file or a shared
table, etc.
 How to control the nodes so that the shared
resource is accessed by at most one node at a
time is called the mutual exclusion problem.
Mutual Exclusion Example
in CS
Mutual Exclusion Example
in CS
k-Mutual Exclusion
 If
there are k, k1, identical copies of shared
resources, such as a k-user software license, then
there can be at most k nodes accessing the
resources at a time.
 This raises the k-mutual exclusion problem.
k-Mutual Exclusion Example
k=2
in CS
in CS
k-Mutual Exclusion Example
k=2
in CS
in CS
h-out of k-mutual exclusion
 On
some occasions, a node may require to
access h (1hk) copies out of the k shared
resources at a time; for example, a node may
need h disks from a pool of k disks to proceed.
 How to control the nodes to acquire the desired
number of resources with the total number of
resources accessed concurrently not exceeding k
is called the h-out of-k mutual exclusion problem
or the h-out of-k resource allocation problem.
h-out of-k Mutual Exclusion Example
k=3
in CS (h=2)
in CS (h=1)
h-out of-k Mutual Exclusion Example
k=3
in CS (h=1)
in CS (h=1)
Related Work
 Raynal
(1991): uses request broadcast
 Baldoni
et al. (1998): uses k-arbiters
 Manabe
at al. (2004): uses (h, k)-arbiters
 Jiang
(2004): uses k-coteries
Jiang’s Algorithm
 Among
the four algorithms, only Jiang’s
algorithm using k-coteries is fault-tolerant.
 It can tolerate node and/or network link failures
even when the failures lead to network
partitioning.
 It has lower message cost than others.
k-Coterie
A collection of sets (called quorums) satisfying the following
properties:
1. Intersection Property: There are at most k pairwise
disjoint quorums.
2. Non-intersection Property: For any h (< k) pairwise
disjoint quorums Q1,...,Qh, there exists a quorum Qh+1 such
that Q1,...,Qh+1 are pairwise disjoint.
3. Minimality Property: Any quorum is not a super set of
another quorum.
Basic Idea of Jiang’s Alg.
 A node
should select h mutually disjoint sets and
collect permissions from all the nodes of the h
sets to enter CS for accessing h resources.
 To render the algorithm fault-tolerant, a node is
demanded to repeatedly reselect h mutually
disjoint sets for gathering incremental
permissions when a node fails to gather enough
permissions to enter CS after a time-out period.
Drawbacks of Jiang’s Alg.
 First,
it does not specify explicitly how a node
can efficiently select and reselect h mutually
disjoint sets.
 Second, when there is contention, a low-priority
node always yields its gathered permissions to
high-priority nodes, which causes higher
message overhead and may prohibit nodes from
entering CS concurrently.
Overview of the Proposed Alg.

Using a specific k-coterie  cohorts coterie

Having constant message cost in the best case

A candidate to achieve the highest availability among
all the algorithms using k-coteries

Achieving k-concurrency by pre-release action and
conditional inquiring
Cohorts Structure Coh(k, m)
A cohorts structure Coh(k, m)(C1,...,Cm), mk, is a list
of sets, where each set Ci is called a cohort. The cohorts
structure Coh(k, m) should observe the following three
properties:
P1. C1 = k.
P2. i: 1< i  m : Ci  > 2k2, for k>1 ( Ci >1, for k=1).
P3. i, j: 1i, jm, ij: CiCj=.
Quorum under Coh(k, m)
A set Q is said to be a quorum under Coh(k, m) if some
cohort Ci in Coh(k, m) is Q's primary cohort, and each
cohort Cj, j > i, is Q's supporting cohort.
 C is Q's primary cohort if QC=C(k1)
 C is Q's supporting cohort if QC=1
Construction of Quorums under Coh(2, 4)
1
2
3
4
5
6
7
8
9
10
11
12
12
14
one primary cohort
with supporting cohorts
at rear
 E.G.:
{5, 9, 10, 12} and {1,2, 4, 8, 14}


There will be no disjoint quorum to be formed.
1
2
3
1
2
3
4
5
6
4
5
6
7
8
9
10
7
8
9
10
11
12
12
14
11
12
12
14
Six Types of Messages
 REQUEST
 LOCKED
 RELEASE
 PRE-RELEASE
 INQUIRE
 RELINQUISH
Comparison: Maekawa’s
algorithm uses the following
six messages:
•REQUEST
•LOCKED
•FAILED
•RELEASE
•INQUIRE
•RELINQUISH
Requesting h Resources
 When
a node u wants to enter CS to access
h resources, u should invoke
Get_Quorum(h, k, (C1,...,Cm)) and waits for
it to return.
 h, k: integers
 (C1,...,Cm): Cohorts Structure
Function Get_Quorum(h,k:Integer;(C1,...,Cm):Cohorts Structure):Set;
Var R, S: Set; g: Integer;
g = h; //g: Storing the number of primary cohorts needed
R = ; //R: The set of replying nodes that will be returned
For (i =m,...,2 ) Do
S=Probe(Ci, g);
If S = Ci(k1)+(g1)
Then {R=RS; g=g1; If g=0 Then Return R;}
Else If S=g Then R=RS;
EndFor
S=Probe(C1, g); //C1 is the primary cohort of g quorums
R=RS;
Return R;
End Get_Quorum
Probe(Ci, g)
 The function Probe(Ci, g) evoked in Get_Quorum
performs the task of requesting all the nodes in set Ci
for their exclusive permissions.
 A node can only reply to one requesting node at a time
to grant its permission.
 After a network turn-around time, Probe(Ci, g) returns
a set S of replying nodes of Ci for three cases:
Case 1 for Probe(Ci, g) to return
 If i>1 and there are more than Ci(k1)+(g1)
replying nodes, the returning set will be a set of
Ci(k1)+(g1) replying nodes.
 For this case, Ci can be the primary cohort of
one quorum, and be the supporting cohorts of
g1 quorums concurrently.
Case 2 for Probe(Ci, g) to return
 If i>1 and there are more than g but less than
Ci(k1)+(g1) replying nodes, the returning set
will be a set of g replying nodes. (Note that
Ci(k1)+(g1)>g because Ci > 2k2 for k>1, or
Ci > 1 for k=1.)
 For this case, Ci can only be the supporting
cohorts of g quorums.
Probe(Ci, g) waits
 It
is noted that Probe(Ci, g) will postpone
the return if none of the three cases stands,
which means no node in Ci can reply to
grant its permission immediately.
Case 3 for Probe(Ci, g) to return
 If i=1 and there are more than g replying nodes,
the returning set will be a set of g replying
nodes.
 Because C1=k, only one node can make C1 the
primary cohort of a quorum.
 Thus, g replying nodes can make C1 be the
primary cohorts of g quorums for this case.
Pre-release
 Probe(Ci, g) will execute the pre-release action before
returning S; that is, it will send messages to the nodes
in Ci – S to release their permissions in advance.
 The pre-release action plays an important role in the
proposed algorithm. As we will show later, the action
can allow more nodes to be in CS concurrently and is
related to the deadlock-free property.
On Receiving REQUEST
 On
receiving a REQUEST from node u, a node v
checks it is currently locked for another
REQUEST. If not so, v marks itself locked, set u
as the locker, records the number h of resources
that u requests, and sends a LOCKED message
to u.
Two Local Priority Queues
 R-QUEUE:
On receiving a REQUEST from
node u, if v is locked for a REQUEST from
another node w (w is the locker), the REQUEST
from node u is inserted into R-QUEUE
 P-QUEUE: On receiving a PRE-RELEASE message
from node u, node v inserts the message into P-QUEUE.
Timestamp
The timestamp is a pair (u, t), where u is the node ID of
the requester and t is the sequence number of the
message.
A REQUEST of timestamp (u, t) is assumed to precede
(to have higher priority than) another REQUEST of
timestamp (u, t) if (t<t) or (t=t  u<u).
A message sequence number is assigned by a node to be
always one more than the largest message sequence
number ever seen.
Conflict Condition
 We said that the conflict condition holds if h1+…+hx +
hw > k for REQUEST messages R1,…,Rx in
R-QUEUE and P-QUEUE preceding Rw (REQUEST
from the locker w), where h1,…,hx, and hw are the
numbers of requested resources for request messages
R1,…,Rx.
 Node w and the nodes sending the messages R1,…,Rx
are called conflicting nodes.
On Receiving PRE-RELEASE
 On
receiving a PRE-RELEASE message
from node u (u must be the locker), node v
inserts the message into P-QUEUE.
 It then marks itself unlocked if R-QUEUE
is empty; otherwise, it removes from RQUEUE the node w, sets w as locker, and
sends w a LOCKED message, where w is
the node at the front of R-QUEUE.
Conditional Inquiring

If conflict condition holds, node v then sends an
INQUIRE message to node w.
 It
is not necessary to send the INQUIRY
message if an INQUIRY has already sent to w
and w has not yet sent RELINQUISH or
RELEASE (we will explain the two messages
later).
On receiving INQUIRE
 When
node w receives an INQUIRE message
from node v, it replies a RELINQUISH message
to cancel its lock if it is not in CS.
 Otherwise, it replies a RELEASE message, but
only after it exits CS. If an INQUIRE message
has arrived after w has sent a RELEASE
message, it is simply ignored.
On receiving RELINQUISH
 On
receiving a RELINQUISH message form w
(w must be the locker), node v swaps w with u,
sets u as the locker, and sends a LOCKED
message to u, where u is the node at the front of
R-QUEUE.
Effective Permission
 Node
u is said to have an effective permission of
node v if u has received LOCKED from v and
does not sent corresponding RELINQUISH or
RELEASE to v.
Entering CS
 Get_Quorum evoked by node u will eventually return a
set R, which is the union of h pairwise disjoint cohorts
quorums.
 Node u can enter CS and access h resources if node u
has effective permissions of all nodes in R.
 If u does not have effective permissions of all nodes in R,
i.e., u has sent RELINQUISH messages to some nodes
v1,..,vi of R, 1iR, then u must wait for LOCKED
messages from v1,..,vi to enter CS.
Existing CS
 After
existing CS, node u should send
RELEASE message to all the nodes to which u
has sent REQUEST.
On receiving RELEASE
On receiving a RELEASE message from node u (u
may or may not be the locker), node v removes u’s
PRE-RELEASE message from P-QUEUE if u’s
PRE-RELEASE message is in P-QUEUE.
 Node v marks itself unlocked if R-QUEUE is empty;
otherwise, it removes from R-QUEUE the
REQUEST message of node w, sets w as locker, and
sends w a LOCKED message, where w is the node
whose REQUEST is at the front of R-QUEUE.

Analysis
 Message
Cost:
 Best Case: 4ch, where c is the cohort size, c > 2k2
one REQUEST, LOCKED and RELEASE for each node
in R and one PRERELEASE for the nodes in
(Cm…Cmh1)R.
 Worst Case: 7n, where n is the number of nodes
one REQUEST, INQUIRE, RELINQUISH, LOCKED,
RELEASE, LOCKED for each nodes and one RELEASE
for those not in R.
(it occurs only when there are conflicting nodes.)
Comparison
Algorithm
Message complexity
k-Concurrency
Raynal’s algorithm between 2(n1) and 3(n1)
yes
[10]
The algorithm using between 3q to (3h+3)q, where
yes
k/(k+1)
k-arbiters [2]
q=(k+1)n
for the (k+1)-cube
arbiter and q= k  n /( k  1)  1 for
the uniform k-arbiter
The algorithm using between 3q to (3h+3)q, where
yes
(k+1h)/(k+1)
(h,k)-arbiters [9]
q=(k+2h)n
for the
(k+1)-cube (h,k)-arbiter and
q= k  n /( k  h)  1 for the
uniform (h,k)-arbiter
Jiang’s algorithm [5] between 3hq and 6en, where q no
is the quorum size of the
k-coterie used, and 0<e1
The proposed
between 4ch and 7n, where c > yes
algorithm
2k2
FaultTolerance
no
no
no
yes
yes (maybe of
the highest
availability)
*n stands for the number of nodes, and h stands for the number of requested resources.
Conclusion
The proposed algorithm becomes a k-mutual
exclusion algorithm for k>h=1, and becomes a mutual
exclusion algorithm for k=h=1.
 It is resilient to node and/or link failures and has
constant message cost in the best case.
 It is a candidate to achieve the highest availability
among all the algorithms using k-coteries since the
cohorts coterie is ND.
 It has the k-concurrency property, which guarantees
that a low-priority node is not postponed by a highpriority node when there are not conflicting nodes.

Thanks!!
Dominated k-Coteries
 Let C and D be two distinct k-coteries. C is said to
dominate D if and only if every quorum in D is a super
set of some quorum in C (i.e.,Q, Q: QD, QC:
QQ).
 Obviously, the dominating one (C) has more chances
than the dominated one (D) to have available
quorums.
 A quorum is said to be available if all of its
members (nodes) are up.
Nondominated k-Coteries
 Since
an available quorum implies an available
entry to CS, we should always concentrate on
ND (nondominated) k-coteries that no other kcoterie can dominate.
 The algorithm using ND k-coteries, for example
the proposed algorithm, is a candidate to achieve
the highest availability.