clock management

Model
• Distributed system model:
– n processes connected by a communications network
– imperfectly synchronized clocks, thus no exact notion
of “what time it is” at other processes
– reliable message delivery
– may or may not bound message delivery time
• depending on what problem we’re examining
• depending on how practical we want solution to be
– no shared memory
1
Fundamental Limitations
• Distributed systems are assumed to have
no global clock
• Distributed systems have no shared
memory
• These limitations force us to be creative in
solving the following fundamental
problems (among others)
– Ordering events
– Capturing a global state
– Ensuring mutual exclusion
2
Time in Distributed Systems
• Three notions of time:
– Time seen by external observer: A global clock of
perfect accuracy
– Time seen on clocks of individual hosts: Each has
its own clock, and clocks may drift out of sync
– Logical time: event a occurs before event b and
this is detectable because information about a
may have reached b
• b is causally dependent on a
3
External Time
• The “gold standard” against which many
protocols are defined
• Not implementable with non-zero
transmission speeds
• GPS helps, but there is still latency
• …and GPS only works outside. 
4
Internal Clocks
• Most workstations have reasonable clocks
• But clocks drift apart and software resynchronization is
inaccurate
• Workstation clocks are appropriate only for coarse-grained
notion of time
• Unpredictable speeds a feature of all computing systems
–
–
–
–
Thus: can’t reliably predict how long events will take
Example: how long for message transmission
Example: how long to compute n!
Scheduling issues, daemon processes, other applications
• “Artificially” choose bounds which trade off performance for
safety
• “If I don’t receive a response in 15 seconds, server is down”
5
Logical Time
• Abstraction
• Doesn’t provide a clock in the sense of wall clock time
• Focus is on definition of the “happens before”
relationship between events
• “Could what’s happening in this event have been
influenced by what happened in that event?”
• Events are:
– message sends
– message receptions
– local events
6
Logical Time: The Picture
p0
a
a,b and a,c are concurrent
b
d
p1
p2
c
c happens after b
e
p3
e happens after a, b, c, d
7
Happened-Before Relation
• The happened before relation (  ) captures the causal
relationships between events:
– a  b if a and b are events in the same process and a
occurred before b
– a  b if a is the send event of a message m and b is
the corresponding receive event
•  is a transitive relation
• Event a potentially causally affects event b if a  b
• Event a is concurrent with event b if neither
a  b nor b  a
8
Three Important Schemes
• Lamport’s logical clocks
– clock is a single integer
• Vector clocks
– clock is a vector of integers with length n
– n = number of processes
• Matrix clocks
– clock is an n x n matrix of integers
– n = number of processes
– Useful for ordered multicast…covered later
• Optimizations for the latter two schemes to reduce
overheads are important!
9
Lamport’s Clocks
• Lamport’s clocks captures partial ordering
defined by the  relation
• Lamport’s logical clocks:
– Each process i maintains a counter Ci whose
current value is attached to each message sent
– Internal events in a process cause the counter Ci to
be incremented
– When a message is received by a process i, Ci is set
to max(m.Cj, Ci) + 1
10
Lamport’s Clock
•
Happened before relation:
–
–
–
•
Causally Ordered Events
–
•
a -> b : Event a occurred before event b. Events in the same
process.
a -> b : If a is the event of sending a message m in a process
and b is the event of receipt of the same message m by
another process.
a -> b, b -> c, then a -> c. “->” is transitive.
a -> b : Event a “causally” affects event b
Concurrent Events
–
Only one is true: a -> b (or) b -> a (or) a || b.
11
Space-time Diagram
Space
P1
e12
e11
e14
e13
P2
e21
e22
e23
e24
Time
12
Logical Clocks
•
Conditions satisfied:
–
–
–
•
Ci is clock in Process Pi.
If a -> b in process Pi, Ci(a) < Ci(b)
a: sending message m in Pi; b : receiving message m in Pj; then,
Ci(a) < Cj(b).
Implementation Rules:
–
–
R1: Ci = Ci + d (d > 0); clock is updated between two successive
events.
R2: Cj = max(Cj, tm + d); (d > 0); When Pj receives a message m
with a time stamp tm (tm assigned by Pi, the sender; tm =
Ci(a), a being the event of sending message m).
13
Space-time Diagram
Space
P1
e12 e13 e14 e15 e16
e11
(1)
P2
(2)
(1)
(2)
e21
e22
(3) (4) (5) (6)
(3)
e23
e17
(7)
(4)
(7)
e24
e25
Time
14
Limitation of Lamport’s Clock
Space
P1
e12
e11
e14
e13
P2
e21
P3
e31
e22
e32
e23
e24
e33
Time
15
Vector Clocks
•
•
•
•
Keep track of transitive dependencies among processes
for recovery purposes.
Ci[1..n]: is a “vector” clock at process Pi whose entries
are the “assumed”/”best guess” clock values of different
processes.
Ci[j] (j != i) is the best guess of Pi for Pj’s clock.
Vector clock rules:
–
–
Ci[i] = Ci[i] + d, (d > 0); for successive events in Pi
For all k, Cj[k] = max (Cj[k],tm[k]), when a message M is
received by Pj from Pi.
: For all
16
Vector Clock …
Space
P1
e12
e11
(2,0,0)
(1,0,0)
P2
(0,1,0)
e21
P3
e14
e31
(3,4,1)
(2,2,0) (2,3,1)
e22
e23
(2,4,1)
e24
e32
(0,0,1) (0,0,2)
Time
17
Causal Ordering of Messages
Space
P1
P2
Send(M1)
Send(M2)
(1)
P3
(2)
Time
18
Message Ordering …
•
•
•
•
•
Not really worry about maintaining clocks.
Order the messages sent and received among all
processes in a distributed system.
Send(M1) -> Send(M2), M1 should be received ahead of
M2 by all processes.
This is not guaranteed by the communication network
since M1 may be from P1 to P2 and M2 may be from P3
to P4.
Message ordering:
–
–
Deliver a message only if the preceding one has already been
delivered.
Otherwise, buffer it up.
19
Global State
Global State 1
C1: Empty
$500
$200
A C2: Empty
B
Global State 2
C1: Tx $50
$450
A C2: Empty
Global State 3
$200
B
C1: Empty
$450
A C2: Empty
$250
B
20
Recording Global State
•
•
•
•
•
•
•
Send(Mij): message M sent from Si to Sj
rec(Mij): message M received by Sj, from Si
time(x): Time of event x
LSi: local state at Si
send(Mij) is in LSi iff (if and only if) time(send(Mij)) <
time(LSi)
rec(Mij) is in LSj iff time(rec(Mij)) < time(LSj)
transit(LSi, LSj) : set of messages sent/recorded at LSi and
NOT received/recorded at LSj
21
Recording Global State …
•
•
•
•
inconsistent(LSi,LSj): set of messages NOT sent/recorded
at LSi and received/recorded at LSj
Global State, GS: {LS1, LS2,…., LSn}
Consistent Global State, GS = {LS1, ..LSn} AND for all i in n,
inconsistent(LSi,LSj) is null.
Transitless global state, GS = {LS1,…,LSn} AND for all I in n,
transit(LSi,LSj) is null.
22
Recording Global State ..
LS1 M2
S1 M1
S2
LS2
M1: transit
M2: inconsistent
23
Recording Global State...
•
Strongly consistent global state: consistent and
transitless, i.e., all send and the corresponding receive
events are recorded in all LSi.
LS12
LS11
LS22
LS23
LS21
LS31
LS32
LS33
24
Chandy-Lamport Algorithm
•
•
•
•
Distributed algorithm to capture a consistent global state.
Communication channels assumed to be FIFO.
Uses a marker to initiate the algorithm. Marker sort of dummy
message, with no effect on the functions of processes.
Sending Marker by P:
– P records its state. For each outgoing channel C, P sends a
marker on C with the state info.
Receiving Marker by Q:
– If Q has NOT recorded its state: a. Record the state as an empty
sequence, SEND marker (use above rule).
– Else (Q has recorded state before): Record the state of C as
sequence of messages received along C, after Q’s state was
recorded and before Q received the marker.
25
Chandy-Lamport Algorithm
•
•
•
Initiation of marker can be done by any process, with its own unique
marker: <process id, sequence number>.
Several processes can initiate state recording by sending markers.
Concurrent sending of markers allowed.
One possible way to collect global state: all processes send the
recorded state information to the initiator of marker. Initiator
process can sum up the global state.
Seq
Sj
Si
Sc
Seq’
26
Chandy-Lamport Algorithm ...
•
Example:
Pi
Record
channel
state
Pj
Pk
Send
Send
Marker Record Marker Record
channel
channel
state
state
Channel state example: M1 sent to Px at t1, M2 sent to Py at t2, ….
27