Distributed Systems
31. Theoretical Foundations of
Distributed Systems - Coordination
Simon Razniewski
Faculty of Computer Science
Free University of Bozen-Bolzano
A.Y. 2015/2016
Co-ordination Algorithms
are fundamental in distributed systems:
• to dynamically re-assign the role of master
– choose primary server after crash
– co-ordinate resource access
• for resource sharing: concurrent updates of
– entries in a database (data locking)
– files
– a shared repository
• to agree on actions: whether to
– commit/abort database transaction
– agree on readings from a group of sensors
2
Co-ordination Problems
1. Clock Synchronization
– processes must agree on order of events
– crucial e.g. for concurrent transactions in databases or
change polling in distributed file systems
2. Leader election
– after crash failure has occurred
– after network reconfiguration
3. Mutual exclusion
– distributed form of synchronized access problem
– not covered here
3
1. CLOCK SYNCHRONIZATION
Why to synchronize clocks?
• Important to end users
– Ebay auctions
– Enrolment for sport courses/student dorms
– Order of chat/mail messages
• Important internally in distributed systems
– Concurrent updates in distributed databases
– Distributed file systems
End Users
• Synchronize with an authoritative source
• Network time protocol (NTP): UDP on port
123
• “In operation since before 1985, NTP is one of
the oldest Internet protocols in current use.”
• How?
Network Time Protocol (NTP)
• e.g. time.windows.com
Wireshark
System internal
• Exact time not necessarily important
• Important is a relative ordering
between events
• Database updates
• File system changes
Distributed Bank Database
A bank keeps replicas
of bank accounts
in Milan and Rome
• Event 1:
Customer pays 100 € into his account of 1000 €
• Event 2:
The bank adds 1% interest
Info is broadcast to Milan and Rome
9
Message Ordering
s end
X
receiv e
1
m1
2
Y
receiv e
4
s end
3
m2
receiv e
Phy sical
time
receiv e
s end
Z
receiv e
receiv e
m3
A
t1
t2
m1
m2
receiv e receiv e receiv e
t3
How can A know the order in which the messages were sent?
10
Time Ordering of Events (Lamport)
Observation:
For some events E1, E2,
it is “obvious” that E1 happened before E2
(written E1 E2)
• If E1 happens before E2 in process P, then E1 E2
• If E1 = send(M) and E2 = receive(M), then E1 E2
(M is a message)
• If E1 E2 and E2 E3 then E1 E3
(transitivity)
11
Logical Clocks
Goal: Assign “timestamps” ti to events Ei such that
E1 E2 t1 < t2
(partial order!)
Approach: Processes
• incrementally number their events
• send numbers with messages
• update their “logical clock” to
max(OwnTime, SendTime) +1
when they receive a message
12
Logical Clocks in the Message Scenario
s end
X
receiv e
1
m1
2
Y
receiv e
4
s end
3
m2
receiv e
Phy sical
time
receiv e
s end
Z
receiv e
2
receiv e
5
4
m3
A
t1
t2
Messages carry numbers
m1
m2
receiv e receiv e receiv e
t3
5
1
3
For a tie break, use process numbers as second component!
13
Combined with classical clocks
Three processes, each with its own clock.
The clocks run at different rates.
Combined with classical clocks (2)
Lamport’s algorithm corrects the clocks.
Lamport’s Logical Clocks (4)
• Figure 6-10. The positioning of Lamport’s
logical
clocks in distributed systems.
Back to the distributed database…
1. On receive: Queue message based on their logical
sending timestamp
2. Acknowledge receipt of messages to all other peers
3. Only process messages upon receipt of
acknowledgment by all other peers
Unique processing order for all peers
“Totally-ordered multicast”
2. Leader Election
Leader Election
• The problem:
–
–
–
–
N processes, may or may not have unique IDs (UIDs)
Old coordinator has crashed/disappeared due to network issues
must choose a new unique coordinator among themselves
one or more processes might start process simultaneously
• Qualitative properties:
– Correctness: Every process has the same value in the variable elected
– Liveness: All processes participate and eventually discover the identity of the leader
(elected cannot stay undefined).
• Quantitative properties
– Bandwidth: total number of messages sent around
– Turnaround: number of steps needed to come to a result
19
Election on a Ring (Chang/Roberts 1979)
• Assumptions:
– each process has a UID, UIDs are linearly ordered
• ideas how to obtain them?
– processes form a unidirectional logical ring, i.e.,
• each process has channels to two other processes
• from one it receives messages, to the other it sends messages
• Goal:
– process with highest UID becomes leader
20
Election on a Ring (cntd)
Processes
• send two kinds of messages: elect(UID), elected(UID)
• can be in two states: non-participant, participant
Two phases
• Determine leader
• Announce winner
Initially, each process is non-participant
21
Algorithm: Determine Leader
• Some process with UID id0 initiates the election by
– becoming participant
– sending the message elect(id0) to its neighbour
• When a non-participant receives a message elect(id)
– it forwards elect(idmax), where idmax is the maximum of its own and the
received UID
– becomes participant
• When a participant receives a message elect(id)
– it forwards the message if id is greater than its own UID
– it ignores the message if id is less than its own UID
22
Algorithm: Announce Winner
• When a participant receives a message elect(id)
where id is its own UID
– it becomes the leader
– it becomes non-participant
– sends the message elected(id) to its neighbour
• When a participant receives a message elected(id)
– it records id as the leader’s UID
– Becomes non-participant
– forwards the message elected(id) to its neighbour
23
Election on a Ring: Example
3
17
non-participants
4
24
participants
9
1
15
28
24
24
Properties
• Correctness:
–
• Liveness
– clear, if only one election is running
– what, if several elections are running at the same time?
participants do not forward smaller IDs
• Bandwidth:
– at most 3n – 1 (if a single process starts the election, what if several processes start an election?)
• Turnaround:
– at most 3n-1 (if …)
25
Under Which Conditions can it Work?
• What if there is a failure (process or connection)?
– the election gets stuck
assumption: no failures or timeouts during election
(in token rings, nodes are connected to the network by a
connector, which may pass on tokens, even if the node has failed)
• When is this applicable?
– token ring/token bus/virtual ring (Chord)
26
Bully Algorithm (Garcia-Molina)
• Idea: Process with highest ID imposes itself as the leader
• Assumption:
Turing Award 2015!
– each process has a unique ID
– each process knows the IDs of the other processes
• When is it applicable?
– IDs don't change
– Set of participants constant
– Possibly much faster than ring algorithm
27
Bully Algorithm: Principles
• A process detects failure of the leader
• The process starts an election by notifying the potential
candidates (i.e., processes with greater ID)
– if no candidate replies,
the process declares itself the winner of the election
– if there is a reply,
the process stops its election initiative
• When a process receives a notification
– it replies to the sender
– and starts an election if its ID is higher than the one of the sender
28
Bully Algorithm: Messages
• Election message:
– to “call elections” (sent to nodes with higher UID)
• Answer message:
– to “vote” (… against the caller, sent to nodes with lower UID)
• Coordinator message:
– to announce own acting as coordinator
29
Bully Algorithm: Actions
• Initially: The process with highest UID sends coordinator
message
• A process starting an election sends an election message
– if no answer within time T = 2 Ttransmission + Tprocess,
then it sends a coordinator message
• If a process receives a coordinator message
– it sets its coordinator variable
• If a process receives an election message
– it answers and begins another election (if needed)
• If a new process starts to coordinate (highest UID),
– it sends a coordinator message and “bullies” the current coordinator out
30
Example (1)
(a) Process 4 holds an
election.
(b) Processes 5 and 6
respond, telling 4 to stop.
(c) Now 5 and 6 each
hold an election.
(d) Process 6 tells 5 to stop.
(e) Process 6 wins and tells
everyone.
Properties of the Bully Algorithm
• Liveness
– guaranteed because of timeouts
• Correctness
– clear if group of processes is stable
(no new processes)
– not guaranteed if new process declares itself as the leader
during election (e.g., old leader is restarted)
• two processes may declare themselves as leaders at the same time
• but no guarantee can be given on the
order of delivery of those messages
32
Quantitative Properties
• Best case:
– process with 2nd highest ID detects failure
• Worst case:
– process with lowest ID detects failure
• Bandwidth:
– N - 1 messages in best case
– O(N2) in worst case
• Turnaround:
– 1 message in best case
– 3 messages in worst case
33
Comparison
Election without UIDs (Itai/Rodeh)
• Assumptions
– N processes, unidirectional ring
– processes do not have UIDs
• Election
– each process selects ID at random from set {1,…,K}
• non-unique! but fast
– processes pass all IDs around the ring
– after one round, if there exists a unique ID then elect maximum
unique ID
– otherwise, repeat
• Liveness?
Probabilistically!
35
Election without UIDs (cntd)
• How many rounds does it take?
– the larger the probability of a unique ID, the faster
the algorithm
– expected time: N=4, K=16, expected 1.01 rounds
36
Take-home
• Clock synchronization
– Synchronization with reference server
– Relative ordering: Lamport’s logical clocks
• Leader Election
– Bully algorithm
– Ring-based election
Appendix: Mutual Exclusion
Mutual Exclusion
Motivation: Transactions
Algorithms:
• Centralized algorithm
– Server/Leader needed
• Distributed algorithm
• Token-based algorithm
– Server/Leader needed for generating token
Mutual Exclusion
A Centralized Algorithm (1)
Process 1 asks the coordinator for permission to
access a hared resource. Permission is granted.
Mutual Exclusion
A Centralized Algorithm (2)
Process 2 then asks permission to access the same
resource. The coordinator does not reply.
Mutual Exclusion
A Centralized Algorithm (3)
When process 1 releases the resource, it tells the
coordinator, which then replies to 2.
Advantages/Disadvantages?
• Correctness/Liveness
• Turnaround
• Bandwidth
A Distributed Algorithm (1)
Ricart and Agrawala, 1981
• Access requests send to everyone
1. If the receiver is not accessing the resource and does not want
to access it, it sends back an OK message to the sender.
2. If the receiver already has access to the resource, it simply does
not reply. Instead, it queues the request.
3. If the receiver wants to access the resource as well but has not
yet done so, it compares the timestamp of the incoming
message with the one contained in the message that it has sent
everyone. The lowest one wins.
– Logical timestamp sufficient
A Distributed Algorithm (2)
Two processes want to access a
shared resource at the same (global) moment.
A Distributed Algorithm (3)
Process 0 has the lowest
timestamp, so it wins.
A Distributed Algorithm (4)
When process 0 is done,
it sends an OK also, so 2 can now go ahead.
Advantages/Disadvantages?
• Correctness/Liveness
• Turnaround
• Bandwidth
A Token Ring Algorithm
• Token handed around continuously
• Token owner may access resource
A Comparison of the Three Algorithms
© Copyright 2026 Paperzz