The many Faces of Clock Synchronization Christoph Lenzen – MPI for Informatics thanks to my co-authors: Johannes Bund, Danny Dolev, Stephan Friedrichs, Matthias Függer, Markus Hofstätter, Florian Huemer, Pankaj Khanchandani, Attila Kinali, Thomas Locher, Moti Medina, Martin Perner, Thomas Polzer, Markus Posch, Joel Rybicki, Ulrich Schmid, Martin Sigl, Andreas Steininger, Roger Wattenhofer Chips are Distributed Systems - very large (>1010 transistors) -> fault-tolerance mandatory - highly concurrent/parallel -> synchronous operation - very fast (>109 cycles/s) -> communication “slow” Chips are Distributed Systems - very large (>1010 transistors) -> fault-tolerance mandatory - very fast (>109 cycles/s) -> communication “slow” - highly concurrent/parallel -> synchronous operation Clocking VLSI Circuits round r−1 compute round r send round r+1 receive round r+2 Clock Trees clocked element (e.g. register) Distribute clock signal from single source! + very simple + self-stabilizing: recovers from any transient faults + ca. 20ps = 2*10-11 s precision (single chip) Clock Trees: Scalability Issues - clock tree is single point of failure -> components must be extremely reliable - tree dist./physical dist. = Ω(L) (L side length of chip) -> max. difference of arrival times between adjacent gates grows linearly with L -> clock frequency goes down with chip size Clock Trees: Scalability Issues - clock tree is single point of failure -> components must be extremely reliable - tree dist./physical dist. = Ω(L) (L side length of chip) -> max. difference of arrival times between adjacent gates grows linearly with L -> clock frequency goes down with chip size - countermeasure: use higher voltage and wider wires -> electro-magnetic interference causes trouble and strong currents induce large power consumption Clock Generation skew - fully connected system of n nodes f < n/3 Byzantine faults bounded drift: HW clock rates between 1 and θ bounded delay: messages take 0 to d time minimize skew (max. time between matching “ticks”) Clock Generation: Srikanth-Toueg Srikanth & Toueg JACM ‘87 Theorem: ST algorithm has skew 2d. proof: (correct) node sends tick k+1 -> f+1 nodes sent tick k -> all nodes send tick k <d time later -> all nodes send tick k+1 <2d time later Clock Generation: DARTS Függer & Schmid DC ‘12 - Ferringer, Fuchs, Kempf & Steininger DFT ‘06 HW realization of ST algorithm FPGA: 4ns skew ASIC: 1.5ns skew Fuchs & Steininger JECE ‘11 not self-stabilizing if delays are from (d-ε,d), skew lower bound ≈ε, but ST has skew d+ε Lynch & Welch Inf. and Control ‘84 Clock Generation: DARTS - HW realization of ST algorithm FPGA: 4ns skew ASIC: 1.5ns skew not self-stabilizing if delays are from (d-ε,d), skew lower bound ≈ε, but ST has skew d+ε Clock Generation: Pulse Generation - fully connected system of n nodes f < n/3 Byzantine faults bounded drift: HW clock rates between 1 and θ bounded delay: messages take 0 to d time self-stabilization: arbitrary initial state minimize skew (max. time between matching “ticks”) minimize stabilization time (max. time to recover) Clock Generation: Pulse Generation authors rand. stab. time bandwidth appeared in S. Dolev & Welch yes exp(O(f)) O(1) JACM’04 Daliot, D. Dolev no O(f3) O(log f) SSS’03 & Parnas D. Dolev & Hoch D. Dolev, Függer, L. & Schmid L. & Rybicki no O(f) O(f log f) SSS’07 yes O(f) O(1) JACM‘14 yes yes no polylog f O(log f) O(f) polylog f poly f O(log f) arxiv’17 non-trivial messages -> minimize channel bandwidth OP: Pulse Generation “=“ Consensus? Theorem: Given synchronous consensus routine C - tolerating f faults, - running in R(f) rounds, and - using messages of size M(f), we have pulse synchronization algorithm of skew 2d - tolerating f faults, - stabilizing in time O(R(f) log f), and - using bandwith O(M(f) log f) Open Problem: Is pulse synchronization at least as hard as consensus? Clock Generation: DARTS - HW realization of ST algorithm FPGA: 4ns skew ASIC: 1.5ns skew not self-stabilizing if delays are from (d-ε,d), skew lower bound ≈ε, but ST has skew d+ε Clock Generation: Lynch-Welch Lynch & Welch Inf. & Comp. ‘88 goal: match skew lower bound (no self-stabilization) - assume (coarse) initial synchrony - simulate synchronous rounds; each node: 1. broadcasts once (also to self) 2. receives broadcasts 3. updates clock clock update local time round start reception interval predicted | actual start of next round Clock Generation: Lynch-Welch 1. receive clock pulses in window around expected arrival 2. determine average of (f+1)th and (n-f)th arrival time 3. align clock to this value by phase shift Theorem: LW algorithm has skew in the order of ε+(θ-1)d. proof by picture: ETA ETA Clock Generation: LW in Hardware Kopetz & Bauer Proc. IEEE '03 - hybrid SW/HW implementations: TTP, FlexRay - skew in microseconds range Függer, Armengaud & Steininger -> full HW implementation required IEEE Trans. Ind. Informat. '09 Huemer, Kinali & L. ISVLSI ‘16 0.2ns skew for scale: 1ns-1 = 1GHz Clock Generation: LW in Hardware Shamanna, Kurd, Douglas & Morrise VLSIC ‘10 vs. skew technology reliability dev. costs 200 ps FPGAs f=1 3,000€ HW 20 ps single chip f=0 $$$$$ Clock Generation: LW in Hardware + + + + skew technology reliability dev. costs vs. + 200 ps FPGAs f=1 3,000€ HW 20 ps single chip f=0 $$$$$ *Stephan’s favorite opening line for the next topic Innocent Assumption time difference can be turned into a discrete number Metastability Metastability is Rare... measurement equipment metastable ...unless your system runs at GHz speeds! Metastability is Bad - - Marino TC‘81 metastability is impossible to... ...avoid (in non-trivial circuits) Veendrick ...detect (reliably) SSCS‘80 ...resolve (in a meaningful way) probability decreases exponentially over time serious problem in large circuits Beer, Ginosar, Cox, Chaney & Zar DATE‘13 -> EE canon: must use synchronizers Metastability: It’s worse than Byzantines! Byzantine faults: - faulty nodes send arbitrary messages - correct nodes can process these correctly Metastability: - any circuit has input causing metastability - results in out-of-spec behavior -> no deterministic correctness guarantee possible ...with traditional logic! M for Metastability - replace M inputs by 0s and 1s (wildcards) if output does not change, it is stable defines metastable closure fM for any Boolean f AND 0 0 0 1 0 1 0 1 ANDM 0 1 M 0 1 0 0 0 1 0 M M 0 M M Metastability: Metastable Closure - assume basic gates implement metastable closure true for standard CMOS Theorem: Standard logic can implement fM for any f. Corollary: LW can be implemented with deterministic correctness and without loss in precision! Friedrichs, Függer & L. arxiv‘16 ...but general construction has exponential size! Metastability: Metastable Closure Efficient metastability-containing circuits: - measure time differences in Gray code (only one M) - sort Gray codes with one M Bund, L. & Medina DATE‘17 Bund, L. & Medina unpublished Függer, Kinali, L. & Polzer ASYNC‘17 - planning high-frequency ASIC MC implementation -> expect high performance even with bad clocks -> especially useful in space environments ...or any very fast control loop OP: Complexity of Metastable Closure Theorem: Implementing fM is hard in general. unpublished Open Problem: If Boolean function f has circuit of size S, how large is an MC variant that can handle k metastable bits? Clock Generation: DARTS - HW realization of ST algorithm FPGA: 4ns skew ASIC: 1.5ns skew not self-stabilizing if delays are from (d-ε,d), skew lower bound ≈ε, but ST has skew d+ε Clock Generation: Coupling PS and LW - pulse synchronization guarantees self-stabilization - LW guarantees near-optimal skew -> use pulse synchronization alg to “jump-start” LW Clock Generation: Coupling PS and LW - pulses (“beats”) reset LW with rough synchrony issue: PS alg. inaccurate, messes up stabilized LW solution: feedback suggesting time for next beat OP: Better Coupling Mechanisms Theorem: Pulse synchronization can be solved with skew in the order of ε+(θ-1)d. Khanchandani & L. SSS’16 Open Problem: Find coupling mechanism relying on underlying pulse synchronization algorithm only during stabilization. Counting - goal: consistent round numbers for pulses Byzantine fault-tolerant and self-stabilizing but can assume synchronous system now (easier) Counting: Almost as Easy as Consensus Theorem: Given synchronous consensus routine C - tolerating f faults, - running in R(f) rounds, and - using messages of size M(f), we have counting algorithm - tolerating f faults, - stabilizing in time O(R(f) log f), and - using messages of size O(M(f) log f) L. & Rybicki SSS’16 requires minor tweak for randomized algs OP: Counting “=“ Consensus? Theorem: R-round consensus reduces to counting with stabilization time R. Open Problem: Is counting exactly as hard as consensus? ...or: does self-stabilization make consensus harder? Clock Distribution - everything so far assumed full connectivity quadratic number of wires not scalable: - high energy and area consumption - high node complexity: fan-in, fan-out, computation - planar or near-planar topologies better -> simple topologies & small degrees needed - incompatible with worst-case distribution of faults! Clock Distribution - generate clock signal with presented techniques goal: distribute clock signal in a scalable manner - tolerate a few faults locally, many globally ...handling 1 local fault already big deal! -> roughly n1/2 u.i.r. faults can be sustained! clock source Clock Distribution: HEX Dolev, Függer, L., Perner & Schmid JCSS’16 direction of clock propagation small connectivity self-stabilization ( Byzantine tolerance ) small skew (neighbors) Clock Distribution: HEX - wave propagates around faults Clock Distribution: HEX - wave propagates around faults Clock Distribution: HEX - wave propagates around faults Clock Distribution: HEX - wave propagates around faults Clock Distribution: HEX layer width simulation with uniformly random delays ...much better than worst case! OP: Other Clock Distribution Topologies Open Problem: How do other distribution topologies perform under u.i.r. link delays? difficult due to taking median time e.g. Clocking Differently: Gradient Clock Sync Fan & Lynch PODC’04 - different approach: synchronize along data flow! arbitrary undirected graph is given goal: minimize local skew (skew between neighbors) rationale: synchrony needed for communication only Theorem: Optimal local skew is in the order of εlogD (D network diameter). base of log is 1/(θ-1) L., Locher, Wattenhofer JACM’10 OP: Fault-tolerant HW Gradient Clock Sync Open Problem: hardware implementation & correctness proof Open Problem: add self-stabilization Open Problem: add fault-tolerance ...simple in theory, but low-level implementation must be efficient and provably correct! Conclusions Theorem (informal): Reasoning about hardware is hardcore DC theory! Thanks for Listening! Theorem (informal): Reasoning about hardware is hardcore DC theory! 1st Workshop on Hardware Design and Theory (colocated with DISC in Vienna): www.mpi-inf.mpg.de/departments/algorithms-complexity/hdt2017/ survey: bulletin.eatcs.org/index.php/beatcs/article/download/348/330 talk slides: people.mpi-inf.mpg.de/~clenzen/talks/porquerolles17.pptx positions available: www.mpi-inf.mpg.de/departments/algorithms-complexity/offers/#c12102
© Copyright 2026 Paperzz