Kein Folientitel

The many Faces of Clock Synchronization
Christoph Lenzen – MPI for Informatics
thanks to my co-authors: Johannes Bund, Danny Dolev, Stephan Friedrichs,
Matthias Függer, Markus Hofstätter, Florian Huemer, Pankaj Khanchandani, Attila
Kinali, Thomas Locher, Moti Medina, Martin Perner, Thomas Polzer, Markus Posch,
Joel Rybicki, Ulrich Schmid, Martin Sigl, Andreas Steininger, Roger Wattenhofer
Chips are Distributed Systems
- very large (>1010 transistors)
-> fault-tolerance mandatory
- highly concurrent/parallel
-> synchronous operation
- very fast (>109 cycles/s)
-> communication “slow”
Chips are Distributed Systems
- very large (>1010 transistors)
-> fault-tolerance mandatory
- very fast (>109 cycles/s)
-> communication “slow”
- highly concurrent/parallel
-> synchronous operation
Clocking VLSI Circuits
round r−1
compute
round r
send
round r+1
receive
round r+2
Clock Trees
clocked element
(e.g. register)
Distribute clock signal from single source!
+ very simple
+ self-stabilizing: recovers from any transient faults
+ ca. 20ps = 2*10-11 s precision (single chip)
Clock Trees: Scalability Issues
- clock tree is single point of failure
-> components must be extremely reliable
- tree dist./physical dist. = Ω(L) (L side length of chip)
-> max. difference of arrival times between adjacent
gates grows linearly with L
-> clock frequency goes down with chip size
Clock Trees: Scalability Issues
- clock tree is single point of failure
-> components must be extremely reliable
- tree dist./physical dist. = Ω(L) (L side length of chip)
-> max. difference of arrival times between adjacent
gates grows linearly with L
-> clock frequency goes down with chip size
- countermeasure: use higher voltage and wider wires
-> electro-magnetic interference causes trouble and
strong currents induce large power consumption
Clock Generation
skew
-
fully connected system of n nodes
f < n/3 Byzantine faults
bounded drift: HW clock rates between 1 and θ
bounded delay: messages take 0 to d time
minimize skew (max. time between matching “ticks”)
Clock Generation: Srikanth-Toueg
Srikanth & Toueg
JACM ‘87
Theorem: ST algorithm has skew 2d.
proof: (correct) node sends tick k+1
-> f+1 nodes sent tick k
-> all nodes send tick k <d time later
-> all nodes send tick k+1 <2d time later
Clock Generation: DARTS
Függer & Schmid
DC ‘12
-
Ferringer, Fuchs,
Kempf & Steininger
DFT ‘06
HW realization of ST algorithm
FPGA: 4ns skew
ASIC: 1.5ns skew
Fuchs & Steininger
JECE ‘11
not self-stabilizing
if delays are from (d-ε,d), skew lower bound ≈ε,
but ST has skew d+ε
Lynch & Welch
Inf. and Control ‘84
Clock Generation: DARTS
-
HW realization of ST algorithm
FPGA: 4ns skew
ASIC: 1.5ns skew
not self-stabilizing
if delays are from (d-ε,d), skew lower bound ≈ε,
but ST has skew d+ε
Clock Generation: Pulse Generation
-
fully connected system of n nodes
f < n/3 Byzantine faults
bounded drift: HW clock rates between 1 and θ
bounded delay: messages take 0 to d time
self-stabilization: arbitrary initial state
minimize skew (max. time between matching “ticks”)
minimize stabilization time (max. time to recover)
Clock Generation: Pulse Generation
authors
rand. stab. time bandwidth appeared in
S. Dolev & Welch yes
exp(O(f))
O(1)
JACM’04
Daliot, D. Dolev
no
O(f3)
O(log f)
SSS’03
& Parnas
D. Dolev & Hoch
D. Dolev, Függer,
L. & Schmid
L. & Rybicki
no
O(f)
O(f log f)
SSS’07
yes
O(f)
O(1)
JACM‘14
yes
yes
no
polylog f
O(log f)
O(f)
polylog f
poly f
O(log f)
arxiv’17
non-trivial messages -> minimize channel bandwidth
OP: Pulse Generation “=“ Consensus?
Theorem: Given synchronous consensus routine C
- tolerating f faults,
- running in R(f) rounds, and
- using messages of size M(f),
we have pulse synchronization algorithm of skew 2d
- tolerating f faults,
- stabilizing in time O(R(f) log f), and
- using bandwith O(M(f) log f)
Open Problem: Is pulse synchronization
at least as hard as consensus?
Clock Generation: DARTS
-
HW realization of ST algorithm
FPGA: 4ns skew
ASIC: 1.5ns skew
not self-stabilizing
if delays are from (d-ε,d), skew lower bound ≈ε,
but ST has skew d+ε
Clock Generation: Lynch-Welch
Lynch & Welch
Inf. & Comp. ‘88
goal: match skew lower bound (no self-stabilization)
- assume (coarse) initial synchrony
- simulate synchronous rounds; each node:
1. broadcasts once (also to self)
2. receives broadcasts
3. updates clock
clock
update
local
time
round
start
reception
interval
predicted | actual
start of next round
Clock Generation: Lynch-Welch
1. receive clock pulses in window around expected arrival
2. determine average of (f+1)th and (n-f)th arrival time
3. align clock to this value by phase shift
Theorem: LW algorithm has skew in the order of ε+(θ-1)d.
proof by picture:
ETA
ETA
Clock Generation: LW in Hardware
Kopetz & Bauer
Proc. IEEE '03
- hybrid SW/HW implementations: TTP, FlexRay
- skew in microseconds range
Függer, Armengaud
& Steininger
-> full HW implementation required
IEEE Trans. Ind.
Informat. '09
Huemer,
Kinali & L.
ISVLSI ‘16
0.2ns
skew
for scale: 1ns-1 = 1GHz
Clock Generation: LW in Hardware
Shamanna,
Kurd, Douglas
& Morrise
VLSIC ‘10
vs.
skew
technology
reliability
dev. costs
200 ps
FPGAs
f=1
3,000€ HW
20 ps
single chip
f=0
$$$$$
Clock Generation: LW in Hardware
+
+
+
+
skew
technology
reliability
dev. costs
vs.
+
200 ps
FPGAs
f=1
3,000€ HW
20 ps
single chip
f=0
$$$$$
*Stephan’s favorite opening line for the next topic
Innocent Assumption
time difference can be
turned into a discrete number
Metastability
Metastability is Rare...
measurement
equipment
metastable
...unless your system runs at GHz speeds!
Metastability is Bad
-
-
Marino
TC‘81
metastability is impossible to...
...avoid (in non-trivial circuits)
Veendrick
...detect (reliably)
SSCS‘80
...resolve (in a meaningful way)
probability decreases exponentially over time
serious problem in large circuits
Beer, Ginosar, Cox,
Chaney & Zar
DATE‘13
-> EE canon: must use synchronizers
Metastability: It’s worse than Byzantines!
Byzantine faults:
- faulty nodes send arbitrary messages
- correct nodes can process these correctly
Metastability:
- any circuit has input causing metastability
- results in out-of-spec behavior
-> no deterministic correctness guarantee possible
...with traditional logic!
M for Metastability
-
replace M inputs by 0s and 1s (wildcards)
if output does not change, it is stable
defines metastable closure fM for any Boolean f
AND 0
0
0
1
0
1
0
1
ANDM
0
1
M
0 1
0 0
0 1
0 M
M
0
M
M
Metastability: Metastable Closure
-
assume basic gates implement metastable closure
true for standard CMOS
Theorem: Standard logic can implement fM for any f.
Corollary: LW can be implemented
with deterministic correctness and
without loss in precision!
Friedrichs,
Függer & L.
arxiv‘16
...but general construction has exponential size!
Metastability: Metastable Closure
Efficient metastability-containing circuits:
- measure time differences in Gray code (only one M)
- sort Gray codes with one M
Bund, L. & Medina
DATE‘17
Bund, L. & Medina
unpublished
Függer, Kinali,
L. & Polzer
ASYNC‘17
- planning high-frequency ASIC MC implementation
-> expect high performance even with bad clocks
-> especially useful in space environments
...or any very fast
control loop
OP: Complexity of Metastable Closure
Theorem: Implementing fM is hard in general.
unpublished
Open Problem: If Boolean function f has
circuit of size S, how large is an MC variant
that can handle k metastable bits?
Clock Generation: DARTS
-
HW realization of ST algorithm
FPGA: 4ns skew
ASIC: 1.5ns skew
not self-stabilizing
if delays are from (d-ε,d), skew lower bound ≈ε,
but ST has skew d+ε
Clock Generation: Coupling PS and LW
- pulse synchronization guarantees self-stabilization
- LW guarantees near-optimal skew
-> use pulse synchronization alg to “jump-start” LW
Clock Generation: Coupling PS and LW
-
pulses (“beats”) reset LW with rough synchrony
issue: PS alg. inaccurate, messes up stabilized LW
solution: feedback suggesting time for next beat
OP: Better Coupling Mechanisms
Theorem: Pulse synchronization can be
solved with skew in the order of ε+(θ-1)d.
Khanchandani & L.
SSS’16
Open Problem: Find coupling mechanism
relying on underlying pulse synchronization
algorithm only during stabilization.
Counting
-
goal: consistent round numbers for pulses
Byzantine fault-tolerant and self-stabilizing
but can assume synchronous system now (easier)
Counting: Almost as Easy as Consensus
Theorem: Given synchronous consensus routine C
- tolerating f faults,
- running in R(f) rounds, and
- using messages of size M(f),
we have counting algorithm
- tolerating f faults,
- stabilizing in time O(R(f) log f), and
- using messages of size O(M(f) log f)
L. & Rybicki
SSS’16
requires minor tweak
for randomized algs
OP: Counting “=“ Consensus?
Theorem: R-round consensus reduces
to counting with stabilization time R.
Open Problem: Is counting
exactly as hard as consensus?
...or: does self-stabilization
make consensus harder?
Clock Distribution
-
everything so far assumed full connectivity
quadratic number of wires not scalable:
- high energy and area consumption
- high node complexity: fan-in, fan-out, computation
- planar or near-planar topologies better
-> simple topologies & small degrees needed
- incompatible with worst-case distribution of faults!
Clock Distribution
- generate clock signal with presented techniques
goal: distribute clock signal in a scalable manner
- tolerate a few faults locally, many globally
...handling 1 local fault
already big deal!
-> roughly n1/2
u.i.r. faults can
be sustained!
clock
source
Clock Distribution: HEX
Dolev, Függer, L.,
Perner & Schmid
JCSS’16
direction
of clock
propagation
small connectivity
self-stabilization
(
Byzantine tolerance
) small skew (neighbors)
Clock Distribution: HEX
- wave propagates around faults
Clock Distribution: HEX
- wave propagates around faults
Clock Distribution: HEX
- wave propagates around faults
Clock Distribution: HEX
- wave propagates around faults
Clock Distribution: HEX
layer
width
simulation with
uniformly
random delays
...much better
than worst case!
OP: Other Clock Distribution Topologies
Open Problem: How do other
distribution topologies perform
under u.i.r. link delays?
difficult due to
taking median time
e.g.
Clocking Differently: Gradient Clock Sync
Fan & Lynch
PODC’04
-
different approach: synchronize along data flow!
arbitrary undirected graph is given
goal: minimize local skew (skew between neighbors)
rationale: synchrony needed for communication only
Theorem: Optimal local skew is in the
order of εlogD (D network diameter).
base of log is
1/(θ-1)
L., Locher, Wattenhofer
JACM’10
OP: Fault-tolerant HW Gradient Clock Sync
Open Problem: hardware
implementation & correctness proof
Open Problem: add self-stabilization
Open Problem: add fault-tolerance
...simple in theory, but low-level
implementation must be
efficient and provably correct!
Conclusions
Theorem (informal): Reasoning about
hardware is hardcore DC theory!
Thanks for Listening!
Theorem (informal): Reasoning about
hardware is hardcore DC theory!
1st Workshop on Hardware Design and Theory (colocated with DISC in Vienna):
www.mpi-inf.mpg.de/departments/algorithms-complexity/hdt2017/
survey:
bulletin.eatcs.org/index.php/beatcs/article/download/348/330
talk slides:
people.mpi-inf.mpg.de/~clenzen/talks/porquerolles17.pptx
positions available:
www.mpi-inf.mpg.de/departments/algorithms-complexity/offers/#c12102