LayeredRand - The University of Texas at Austin

An Efficient Decentralized Algorithm
for the Distributed Trigger Counting
(DTC) Problem
Venkatesan T. Chakravarthy (IBM Research-India)
Anamitra Roy Choudhury (IBM Research-India)
Vijay Garg (University of Texas at Austin)
Yogish Sabharwal (IBM Research-India)
Distributed Trigger Counting (DTC)
Problem



Distributed system with n processors
Each processor receives some triggers from an external
source
Report to the user when the number of triggers received
equals w. (In general w>>n)
Applications of DTC Problem

Distributed monitoring



traffic volume : raise an alarm if #vehicles on a highway
exceeds some threshold
wildlife behavior: #sightings of a particular species in a
wildlife region exceeds a value.
Global Snapshots:

the distributed system must determine if all in-transit
messages have been received to declare snapshot valid.
This problem reduces to DTC Problem [Garg, Garg,
Sabharwal 2006]
Assumptions:

complete graph model, i.e., any processor can
communicate with any other processor

no shared clock and no shared memory

processors communicate using messages

reliable message delivery

no faults in the processors.
Measure of any DTC Algorithm:

Low message complexity

Low MaxRcvLoad,
the maximum number of messages received by any
processor in the system.

Low MsgLoad
the maximum number of messages communicated
by any processor in the system.
Trivial Algorithm

Fix one node to be Master Node

Total Deficit (w) is maintained by the Master Node

Any processor that receives a trigger informs the Master
Node
 The master node decrements the deficit

Finish when deficit reaches zero

Total messages = O(w)

MaxRcvLoad and MsgLoad also O(w)
Previous Work:

Any deterministic algorithm has message complexity Ω(n log(w/n))

Centralized algorithm
[Garg et al]



message complexity O(n log w).
MaxRcvLoad can be as high as O(n log w).
Tree-based algorithm



message complexity O(n log n log w).
more decentralized in a heuristic sense.
MaxRcvLoad can be as high as O(n log n log w), in the worst case.
Algorithm
Message
Complexity
MaxRcvLoad
Centralized
O(n log w)
O(n log w)
Tree based
O(n log n log w)
O(n log n log w)
LayeredRand
O(n log n log w)
O(log n log w)
CoinRand [IPDPS 11] O(n log w)
O(log w)
Modifications to the trivial algorithm




Any processor sends message (count of triggers received) to
the master only after it receives B triggers.
Works in multiple rounds.
w’: deficit at beginning of a round. (initially w’ = w)
B  w' /( 2n)

Master keeps count of the triggers reported by other
processors and the triggers received by itself.
End-of-round declared when count reaches w’/2

System never enters a dead state




Unreported triggers for each processor < B
Count of triggers at master < w’/2
Message complexity O(n log w)


log w rounds
w’/2B = n messages exchanged in every round.
Main Result:
LayeredRand, a decentralized randomized algorithm.

Theorem: For any trigger pattern, the message complexity
of the LayeredRand algorithm is O(n log n log w). Also,
there exists constants c,d > 1 such that
Pr[MaxRcvLoad > c log n log w] < 1/nd
LayeredRand Algorithm






n = (2L -1) processors
arranged in L layers
lth layer has 2l processors,
l=0 to L-1
Algorithm proceeds in multiple rounds.
w’: initial value of a round (number of triggers yet to be
received)


w
Threshold for lth layer defined as l  

l
 4  2  log( n  1) 
C(x): sum of triggers received by x and some
processors in layers below.
LayeredRand Algorithm (Contd.)

For non-root processor x at layer l


If a trigger is received: C(x)++ ;
If C(x)>= τ(l)




pick a processor y from level l-1 at random and send a coin to y.
C(x) := C(x) - τ(l);
If a coin is received from level l+1: C(x) :=C(x)+ τ(l+1).
Root r


maintains C(r) just like others.
 w 
If C(r) >  2  , initiate end-of-round procedure


gets total number of triggers received in this round
broadcasts new value of w’ for next round.
Example
w’ = 96
G
End of round
w’ for next round
= 96- 53 = 43
49
45
6
2
E 5
3
1
τ(1) = 4
τ(2) = 2
3 F
4
1
2
1
2
1
1
1
2
A
B
C
D
Analysis

System does not stall in the middle of a round, when all the
triggers have been delivered.

Message complexity to O(n log n log w)

MaxRcvLoad bounded to O(log n log w) with high probability
Correctness


Consider the state of the system in the middle of any
round.
x: any non-root processor at layer l
C ( x )   (l )  1

Dead state thus implies C(r)>3w’/4, leading to
contradiction.
Message Complexity O(n log n log w)



log w rounds
Every coin sent from layer l to l-1 means that at least τ(l)
triggers have been received at layer l in this round.
 #coins sent from layer l to the layer l-1 is at most w’/ τ(l)
#coins sent in a particular round
L 1
L 1
w'

  4  2l  log n  4  (n  1)  log n
l 1   l )
l 1

O(n) message exchanges for every end-of-round procedure.
MaxRcvLoad O(log n log w) w.h.p.
Prob[MaxRcvLoad of some processor exceeds c log n log w]
< n-(c-1) , for any constant c>=48

In any given round, #coins received by layer l
< w’/ τ(l+1) < 4.2l+1.log n

Each coin sent uniformly and independently at random to one
of the 2l processors occupying layer l.

Mx: r.v. denoting the number of coins received by x

E[Mx] = 8 log n log w

Prob[Mx > 8a. log n log w] < 2

Above result follows by applying union bound.
-8a. log n log w
<n
-8a
, for a>=6
Concurrency

We assumed that the triggers are delivered one at a time


all the processing required for handling a trigger is completed
before the next trigger arrives.
Relax on that assumption
End of Round!
ΣC(x) = 53
instead of 55
G
w’ = 96
49
48
E
τ(1) = 4
τ(2) = 2
1
1
1
1
A
B
F
2
1
C
D
Handling Concurrency

Triggers and coins received during a round placed in queue
and processed one at a time.

Additional features for handling end-of-round.

Default queue and Priority queue




Unprocessed triggers and coins placed in default queue
End-of-round messages in priority queue
Default queue serviced only when priority queue empty
Counters C(x), D(x) and RoundNum


D(x): triggers processed by x since the begin
C(x): reset after every round
Thank You!!
Backup
End-of-round procedure




Processors arranged in a tree.
Four phases.
First Phase: root initiates RoundReset message
A processor x on receiving RoundReset



suspends processing of the default queue until end of round i.e.,
D(x) value not modified further till new round
Non-leaf processor forwards it to its children;
Leaf processor initiates the second phase
End of Round!
w’=96
49 G
48
E 1
1
1
1
A
B
F
2
1
C
D
D(x)=2
Second Phase


Leaf processor initiates Reduce message containing its
D(x) value
A processor x on receiving Reduce from its children


Non-root processor adds its D(x) value to the sum and
forwards it to its parent
Root processor computes w’ – termination or next round.
ΣD(x) = 55 End of Round!
New w’=96
w’=41
49
48 G
E 1
1
1
1
A
B
F
2
1
C
D
D(x)=2
Third Phase



Root broadcasts the new w’ by Inform message.
Every non-leaf processor forwards it to its children;
Leaf processors on receiving Inform message initiate the
fourth phase.
End of Round!
New w’=41
G
τ(1) = 2 E
F
2
1
A
B
C
D
Fourth Phase

Processor in this phase perform the following






RoundNum incremented.
signifies new round i.e., processor does not process any coin
from the previous rounds.
C(x) reset to zero.
InformAck message sent to its parent.
Processing of the default queue resumed.
System (all processors) enters next round when root receives
InformAck
End
Next
of Round!
Round
w’=41
G
E
F
Discard this coin
2
1
A
B
C
D