Bridging the gap between asynchronous design and designers Peter A. Beerel Jordi Cortadella Alex Kondratyev Fulcrum Microsystems, Calabasas Hills, CA, USA Universitat Politècnica de Catalunya, Barcelona, Spain Cadence Berkeley Labs, Berkeley, CA, USA 1 Outline 1. Basic concepts on asynchronous circuit design Tea Break 2. Logic synthesis from concurrent specifications 3. Synchronization of complex systems Lunch 4. Design automation for asynchronous circuits Tea Break 5. Industrial experiences 2 Basic concepts on asynchronous circuit design 3 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous design styles (Micropipelines) Asynchronous logic building blocks Control specification and implementation Delay models and classes of async circuits Channel-based design Why asynchronous circuits ? 4 Synchronous circuit R CL R CL R CL R CLK Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R) 5 Asynchronous circuit Ack R CL R CL R CL R Req Explicit (local) synchronization: Req / Ack handshakes 6 Motivation for asynchronous Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc. Modern clocking is multi–phase and distributed – and virtually ‘asynchronous’ (cf. GALS – next slide): Mesachronous (clock travels together with data) Local (possibly stretchable) clock generation Robust asynchronous design flow is coming (e.g. VLSI programming from Philips, Balsa from Univ. of Manchester, NCL from Theseus Logic …) 7 Globally Async Locally Sync (GALS) Asynchronous World Req1 Clocked Domain Req3 R CL R Ack3 Ack1 Req2 Ack2 Local CLK Async-to-sync Wrapper Req4 Ack4 8 Key Design Differences Synchronous logic design: proceeds without taking timing correctness (hazards, signal ack–ing etc.) into account Combinational logic and memory latches (registers) are built separately Static timing analysis of CL is sufficient to determine the Max Delay (clock period) Fixed set–up and hold conditions for latches 9 Key Design Differences Asynchronous logic design: Must ensure hazard–freedom, signal ack–ing, local timing constraints Combinational logic and memory latches (registers) are often mixed in “complex gates” Dynamic timing analysis of logic is needed to determine relative delays between paths To avoid complex issues, circuits may be built as Delay-insensitive and/or Speed-independent (as discussed later) 10 Verification and Testing Differences Synchronous logic verification and testing: Only functional correctness aspect is verified and tested Testing can be done with standard ATE and at low speed (but high–speed may be required for DSM) Asynchronous logic verification and testing: In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock–freedom Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate Delay fault testing may be needed 11 Synchronous communication 1 1 0 0 1 0 Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set–up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency) 12 Dual rail 1 1 1 0 0 0 Two wires with L(low) and H (high) per bit “LL” = “spacer”, “LH” = “0”, “HL” = “1” n–bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist (e.g. k-of-n) and event–based signalling (choice criteria: pin and power efficiency) 13 Bundled data 1 1 0 0 1 0 Validity signal Similar to an aperiodic local clock n–bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols level sensitive (latch) transition sensitive (register): 2–phase / 4–phase 14 Example: memory read cycle Valid address Address A A Valid data Data D D Transition signaling, 4-phase 15 Example: memory read cycle Valid address Address A A Valid data Data D D Transition signaling, 2-phase 16 Asynchronous modules DATA PATH Data IN start Data OUT done req in ack in req out CONTROL ack out Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start[reset] done- reqout- ackout- ackin(more concurrency is also possible) 17 Asynchronous latches: C element Vdd A A C B Z B B Z A Z A 0 0 1 1 B 0 1 0 1 Z+ 0 Z Z 1 B Z A Static Logic Implementation A B [van Berkel 91] Gnd 18 C-element: Other implementations Vdd Vdd A A B B Weak inverter Z Z B B Dynamic A Gnd A Quasi-Static Gnd 19 Dual-rail logic A.t B.t C.t Dual-rail AND gate A.f C.f B.f Valid behavior for monotonic environment 20 Completion detection Dual-rail logic • • • C done • • • Completion detection tree 21 Differential cascode voltage switch logic start Z.f Z.t done A.t C.f B.f A.f B.t C.t N-type transistor network start 3–input AND/NAND gate 22 Examples of dual-rail design Asynchronous dual-rail ripple-carry adder (A. Martin, 1991) Critical delay is proportional to logN (N=number of bits) 32–bit adder delay (1.6m MOSIS CMOS): 11 ns versus 40 ns for synchronous Async cell transistor count = 34 versus synchronous = 28 More recent success stories (modularity and automatic synthesis) of dual-rail logic from Null-Convention Logic (Theseus Logic) 23 Bundled-data logic blocks Single-rail logic • • • • • • start delay done Conventional logic + matched delay 24 Micropipelines (Sutherland 89) Micropipeline (2-phase) control blocks r1 d1 C Join sel outf in outt Select Merge out0 in out1 Toggle g1 r2 d2 g2 r1 a1 r2 a2 RequestGrant-Done (RGD)Arbiter r a Call 25 Micropipelines (Sutherland 89) Aout delay C L logic L C logic C Rin Ain delay L logic L C Rout delay 26 Data-path / Control L Rin Aout logic L logic L logic L Rout Ain CONTROL 27 Control specification A+ A B+ A– B– B A input B output 28 Control specification A+ B– A B A– B+ 29 Control specification A+ B+ A C+ C A– B– C B C– 30 Control specification A+ B+ C+ A C A– C B B– C– 31 Control specification Ri FIFO cntrl Ao Ro Ri+ Ro+ Ao+ Ai+ Ri- Ro- Ao- Ai- Ai Ri Ao C C Ro Ai 32 A simple filter: specification Ain Rin IN y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT 33 A simple filter: block diagram + x IN Rin Ain Rx OUT y Ax Ry Ay control Ra Aa Rout Aout • x and y are level-sensitive latches (transparent when R=1) • + is a bundled-data adder (matched delay between Ra and Aa) • Rin indicates the validity of IN • After Ain+ the environment is allowed to change IN • (Rout,Aout) control a level-sensitive latch at the output 34 A simple filter: control spec. x IN Rin Ain Rx + y Ax Ry Ay Ra OUT Aa control Rout Aout Rin+ Rx+ Ry+ Ra+ Rout+ Ain+ Ax+ Ay+ Aa+ Aout+ Rin– Rx– Ry– Ra– Rout– Ain– Ax– Ay– Aa– Aout– 35 A simple filter: control impl. Rx Ax Ay Ry Ra A a Aout C Ain Rout Rin Rin+ Rx+ Ry+ Ra+ Rout+ Ain+ Ax+ Ay+ Aa+ Aout+ Rin– Rx– Ry– Ra– Rout– Ain– Ax– Ay– Aa– Aout– 36 Taking delays into account z+ x+ x– y+ x’ z’ x z z– y y– Delay assumptions: • Environment: 3 time units • Gates: 1 time unit events: x+ x’– y+ z+ z’– x– x’+ z– z’+ y– time: 3 4 5 6 7 9 10 12 13 14 37 Taking delays into account z+ x+ x– y+ x z z– y– x’ z’ y very slow Delay assumptions: unbounded delays events: x+ x’– y+ z+ x– x’+ y– failure ! time: 3 4 5 6 9 10 11 38 Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires 39 Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. DI class (built out of basic gates) is almost empty BD DI SI QDI Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). In practice it is the same as speed independent 40 Channel-Based Design Asynchronous channel clock Synchronous System Asynchronous System Synchronization and communication between blocks implemented with handshaking using asynchronous channels by sending/receiving “data tokens” 41 Channel Design – Single Rail sender Ack Req Req receiver Ack 3 1 2 4 Data Data Data stable 4-phase bundled-data channel Features One request wire One wire per data bit One acknowledgment wire Has timing assumptions 42 Channel Design: Dual Rail & 1-of-N Dual Rail Two wires per data bit One acknowledgment wire Advantage: DataT DataF Logical Value 0 0 Reset 0 1 0 1 0 1 1 1 Invalid Supports delay-insensitive design 1-of-N Generalization of dual-rail Ack sender Data 2 Ack receiver 1 4 3 Data (1-of-N) 4-phase 1-of-N channel 43 Anatomy of a Channel-Based Asynchronous Design Architecture is typically a multi-level hierarchy of communicating blocks Reg A Main FSM Reg B Memory Adder ASIC Register Bank Multiplier Yields a hierarchical netlist of cells, where at each level blocks communicate along channels BN-1 BN-2 BN-3 leaf cells Subtract/ Divider channels Adder/ Mult. Reg C FAN-1 FAN-2 FAN-3 FA0 44 Asynchronous Cells Input Channels F Output Channels Definition Smallest element that communicates with its neighbors along asynchronous channels Functionality Reads a subset of input channels Computes F and writes to a subset of output channels Linear Pipelines Only one input and one output channel F 45 Cells for Non-Linear Pipelines • Non-Linear Pipelines Joins and Forks Conditional Joins: Read only some of the input channels Conditional Splits: Write only to some of the output channels F F Join Fork F F Conditional Join Conditional Split 46 Template-Based Leaf-Cell Design • Each pipeline style (QDI, timed…) has a different blueprint • Create a library using a blueprint to implement the lowest level communicating blocks C LCD RCD LCD F C 2-input 1-output pipeline stage LCD RCD F C LCD Blueprint for a QDI N-input M-output pipeline stage RCD F RCD 1-input 2-output pipeline stage 47 Template-Based Leaf-Cell Design • Pros • Enables fine-grain 2-D pipelining yielding high-performance • Simplifies logic synthesis by enabling simple control circuit generation and re-use of typical datapath synthesis • Leaf-cells can be layed-out and verified creating a leaf-cell library, localizing timing assumptions • Cons • Unified template may not be optimal in all cases • Particularly, less effective for non-pipelined architectures with more complicated control 48 Motivation (designer’s view) Modularity for system-on-chip design Plug-and-play interconnectivity Average-case peformance No worst-case delay synchronization Many interfaces are asynchronous Buses, networks, ... 49 Motivation (technology aspects) Low power Automatic clock gating Electromagnetic compatibility No peak currents around clock edges Security No ‘electro–magnetic difference’ between logical ‘0’ and ‘1’in dual rail code Robustness High immunity to technology and environment variations (temperature, power supply, ...) 50 Dissuasion Concurrent models for specification CSP, Petri nets, ...: no more FSMs Difficult to design Hazards, synchronization Complex timing analysis Difficult to estimate performance Difficult to test No way to stop the clock 51 But ... some successful stories Philips AMULET microprocessors Sharp Intel (RAPPID) Start-up companies: Theseus logic, Fulcrum Microsystems, Self–Timed Solutions Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: http://www.technologyreview.com/magazine/ oct01/tristram.asp) …. 52
© Copyright 2026 Paperzz