Lecture 1: Logical and Physical Time with some Applications (Part 1)

Lecture 3:
State, Detection
Anish Arora
CSE 763
The Stability Detection Problem
• A stable property of a distributed system is one that persists:
once a stable property is true it remains true thereafter
• Examples:
• “the computation has terminated”
• “the system is deadlocked”
• “all tokens in a token ring have disappeared”
• Solution
1. Determine the global state of the system
2. Test the global state to see if the stable property holds
Termination Detection
• Processes 0..N-1 arbitrarily connected by channels
• Each process either idle or active
• An active process can become idle spontaneously
• An idle process can become active only upon receiving a
message
The Problem :
Detect that all processes are idle and all channels are empty
Program and Proof (hand-in-hand) Design
 Step 0 : How to count messages in channels.
process j
{send msg}

c.j := c.j + 1

c.j := c.j - 1
▯
{receive msg}
Proof :
Invariant I1  (Sum j :: c.j) = # of messages in channels
Refining the program
 Step 1 : How to detect that all processes are idle.
Consider a logical ring 0 -> … N-1 -> … 0 and pass a token
Let t denote the location of the token
process j
{send msg}

c.j := c.j + 1
▯
{receive msg}

c.j := c.j - 1
▯
{propagate token}  t := t – 1
 j  0  t = j  idle.j
; q := q + c.j
▯
{retransmit token}  t := N – 1
 j = 0  t = j  idle.j
; q := 0
  (q + c.0 = 0)
Refining the proof
Proof : We begin with an idealized Invariant  I1  Q, where
Q  (j : t<j  j<N : idle.j)  (q = (Sum j : t<j  j<N : c.j))
However Q is not preserved by one of the actions
(the receive action for j, t < j  j < N)
But when Q is violated, R becomes true, where
R  q + (Sum j : 0 j  j  t : c.j) > 0
So, we weaken
Invariant  I1  (Q  R)
However R is not preserved by one of the actions
(the receive action for j, 0  j and j  t)
Refining the program again
 Step 2 : How to abort a detection when unsure that the
token traversal was uninterrupted.
process j
{send msg}

c.j := c.j + 1
▯ {receive msg}

c.j := c.j – 1;
; blacken j
▯{propagate
token}

 j  0  t = j  idle.j
t := t – 1
; q := q + c.j
; whiten j
▯{retransmit
token}

t := N – 1
 j = 0  t = j  idle.j
; q := 0
(q + c.0 = 0  0 is white)
; whiten j
Iterated refinement
Proof : Invariant  I1  (Q  R  S) where S  (j:0  j  jt:j is black)
However S is not preserved by one of the actions
(the propagate action at a black node)
So we introduce a color for the token and get the final program
program of process j
{send msg}
▯ {receive msg}
▯
{propagate token}

c.j := c.j + 1

;
c.j := c.j – 1;
blacken j
;
;
;
t := t – 1
q := q + c.j
if black j then blacken token
whiten j
;
;
;
t := N – 1
q := 0
whiten token
whiten j

 j  0  t = j  idle.j
▯ {retransmit token}

 j = 0  t = j  idle.j
(q + c.0 = 0 
token is white  0 is white)
Termination Detection Predicate
Termination

(j :: idle.j) 
# of msgs sent - # of msgs received = 0
Invariant

(Sum j:: c.j) = # of msgs sent - # of msgs received
 (Q  R  S  T)
Q

(j : t<j  j<N : idle.j)  (q=( j : t<j  j<N : c.j))
R

q + ( j : 0 j  j  t : c.j) > 0
S

(j : 0  j  j  t : j is black)
T

token is black
Proof of correctness
1. Invariant  t=0  O is white  idle.0  q+c.0=0  token is white

Termination
2. Invariant  Termination
leads-to
t = 0  0 is white  idle.0  q + c.0 = 0  token is white
Termination Detection
Proof of (1):
• O is white  t = 0

• q + c.0 = 0  t = 0

• token is white

• Hence the antecedent implies
i.e., the antecedent implies
S
R
T
Invariant  Q  q + c.0 = 0
Termination
Proof of (2):
• If termination has occurred, only the propagation and
retransmission actions can execute
• After the first complete traversal of the ring by the token,
all processes are white and the token is white
• At the end of the next traversal, when t = 0, the algorithm
detects the termination of the underlying computation