Advanced Digital Design Asynchronous Design: DI Methods A. Steininger Vienna University of Technology Outline Delay Insensitive design - principle NULL-Convention Logic Code conditions for DI logic Four-State Logic Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 2 Asynchronous Philosophy recall „The control flow requires agreement between source and sink. For this purpose they need to communicate“ Source indicates capture condition for sink. Sink indicates issue condition for source. „HANDSHAKE“ Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 3 recall Handshake Principle REQ: „Data word valid, you can use it“ When can SNK use its input? When it is valid and consistent SRC f(x) SNK When can SRC apply the next input? When SNK has consumed the previous one ACK: „Data word consumed, send the next“ Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 4* A very Important Detail recall The handshake establishes a closed-loop control for the data flow between sender and receiver This makes operation more robust than in the synchronous (= open-loop) case The art of asynchronous design is to make many of these closed loops interoperate properly This is much more complicated than a synchronous design. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 5 Bundled Data at a Glance single-rail data coding 4- or 2-phase handshake Lecture "Advanced Digital Design" © A. Steininger / TU Vienna Source: [Sparso 06] 6 Very disappointing… For a closed loop we need to measure the quantity of interest So far we have not done that: We have not measured validity & consistency We have used time as an indirect measure instead Thus Bounded Delay methods do not provide the benefits of a closed loop BUT: Can we measure validity & consistency at all? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 7 Criticality of ACK recall SRC f(x) SNK L2 „capture!“ cannot measure „act of capturing“ as an event use latching command instead fork produces race between trigger process and next data wave race is uncritical (but still exists!) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 8* Criticality of REQ recall SRC f(x) SNK cannot use issue trigger as an event: produces unacceptable race between data and REQ must introduce timer (bounded delay) OR: find better event (downstream) completion detection Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 9* Completion Detection In order to judge when data are valid & consistent we need to be able to see when this is NOT the case not possible with Boolean logic need representation for INVALID an ACK in parallel to data (bundled data) will always cause a race need more than two signal states for every individual bit (!) need more than one rail per bit Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 10 Multi-level Logic use more than two (e.g. three) voltage levels per rail allows to express „invalid“ in the currently „forbidden“ area between HI and „LO“ requires two thresholds for every gate input output must be able to drive three different levels reliably causes substantial technological problems not further pursued Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 11 Our Options recall We must only use consistent input vectors How can we tell an input vector is consistent? (1) use TIME to mark consistent phases synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information asynchronous/delay insensitive Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 12 * Terminology recall consistent DW: all bits belong to the same context valid signal: result of function applied to consistent DW Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 13 NULL Convention Logic Add the value NULL to the alphabet two-rail coding: Signal X X.a X.b meaning 0 0 NULL (N) 0 1 TRUE (T) 1 0 FALSE (F) 1 1 illegal Lecture "Advanced Digital Design" © A. Steininger / TU Vienna X X.a X.b „DATA“ 14 NCL Functions AND T F N T T F N OR T F N T T T N NOT T F F F F N N N N N F T F N N N N N F T N N naive approach: if any input is „N“ then output „N“ Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 15 NCL Flow Control NULL waves enframe DATA waves NULL NULL NULL NULL TRUE NULL TRUE FALSE TRUE NULL NULL NULL TRUE FALSE TRUE FALSE consistent DATA t Completion detection = check wether all bits are „DATA“ (completeness of DATA) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 16 * Still Problems … What about this situation? output NULL NULL DATA TRUE NULL NULL NULL NULL NULL DATA TRUE TRUE FALSE TRUE NULL NULL consistent DATA TRUE FALSE t Fast bits may catch up with a slow bit from the previous word. The word containing the „old“ bit is considered consistent! Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 17 * Solution Principle Enforce „completeness of NULL“ as well: The output must not go to NULL before all inputs have changed to NULL In a closed loop configuration this keeps the slow paths in synchrony with the fast ones We need different truth table when output is NULL Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 18 Two Truth Tables for DATA waves for NULL waves AND T F N T T F N AND T F N T T F D F F F N N N N N F F F D N D D N D … DATA (T or F) must hold output in last valid state before new input is complete need „hysteresis“ need to consider current output in truth table Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 19 Feedback Gate T A N F B N N F Y Y A N B NT F N Y‘ N N N N N T & F F N T F N Y‘ T F N T F T F N T F N T F N T F N T F N T F N T F N T F T F N F F N F F F F F F F F N T T T T T T T T unstable (Y Y‘) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 20 * No more Problems … Have we solved the problem? output NULL NULL DATA TRUE NULL NULL NULL NULL TRUE FALSE TRUE NULL NULL t consistent DATA YES! The output now remains at DATA with the slowest bit, thus inhibiting (via the closed loop) the fast bits to convey the next DATA wave. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 21 * NCL Gates The desired hysteresis requires an NCL gate to hold its output until all inputs are DATA or all inputs are NULL need storage capability (or feedback loop) even in combinational gate Lecture "Advanced Digital Design" X1.a X1 X1.b X2.a X2.b © A. Steininger / TU Vienna X2 Mem Mem Y Y.a Y.b 22 NCL Gate Implementation p- and n-stack not dual figure shown for one output rail only X1.a X1 X1.b X2.a X2 X2.b Mem Mem Y Y.a Y.b CMOS-Transistors only but no standard cells Lecture "Advanced Digital Design" memory cell at output [G. Sobelmann, K. Fant: CMOS Circuit Design of Threshold Gates with Hysteresis] © A. Steininger / TU Vienna 23 * The Charme of NCL self-regulating data flow in a NULL initialized circuit a DATA front will propagate towards the output alternating waves of NULL and DATA pace the data flow (which, in some sense, forms the „clock“) based on direct assessment of validity & consistency no delay assumptions necessary (ideally), no „worst case“, … globally applicable solution Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 24 Validity and Consistency Consistency (multiple bits @ input) all bits that are combined are valid and belong to the same context Validity (single bit @ output) the bit is the stable result of a combination of consistent bits Consistency implies validity (per definition) but NOT vice versa! Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 25 * Val. & Consistcy. in NCL Validity: output is changed only when consistent input is available („hold“ in truth table) coding ensures direct transistion from valid code to another (NULL is valid but spacer only) continuous validity Consistency: NULL spacer between DATA waves allows identification of context synchronization of context by virtue of „completeness of NULL“ condition Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 26 * What about sync. & BD? Timing ensures that every data item is both valid and consistent at the time it is used: choice of clock period (sync) choice of delay values (BD) In contrast to NCL (temporary) invalidity & inconsistency of data is admitted. No explicit measures (other than timing) are taken/necessary to cope with these issues. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 27 Delay Models recall synchronous model bounded delay model (BD, fundamental) bounds for relative deviation between delays known quasi-delay-insensitive (QDI) known bounds for absolute delays, local timing scalable-delay-insensitive model (SDI) known bounds for delays, global timing output paths of a fork have same delay delay insensitive (DI) no restrictions on delays (just finite) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 28 NCL: A Brief Summary validity & consistency directly visible no timing assumptions required (ideally) „delay insensitive“ (ideally) suitable for CMOS implementation coding of one bit on two rails 2 memory cells per combinational output efficiency: 50% of the data flow are unproductive NULL waves patented und industrially used Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 29 NCL at a Glance dual-rail data coding 4-phase handshake Source: [Sparso 06] Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 30 Our Options recall We must only use consistent input vectors How can we tell an input vector is consistent? (1) use TIME to mark consistent phases synchronous approach / global time base asynchronous/bounded delay (2) use CODING to add information asynchronous/delay insensitive Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 31 * Conditions for DI Coding completion detection needs ONE UNIQUE event: (C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances =>prohibit having no event (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code =>prohibit having more than one event Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 32 * Conditions for DI coding (C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances 0,0,0 Lecture "Advanced Digital Design" ? 0,0,0 © A. Steininger / TU Vienna 33 * Conditions for DI coding (C1) Identification of every context switch It must be possible to clearly separate two successive data words under all circumstances (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 34 * Conditions for DI coding 0,0,0 1,0,0 1,0,1 1,1,1 ? (C2) Unique context membership The transition from one valid code word to the next must be unambiguous, i.e. no intermediate state may be a valid code Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 35 * What about NCL‘s Coding (C1) Return to NULL forces separation between successive data waves (C2) Coding scheme guarantees direct switch from one legal value to next (only one rail changes!) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna Signal X X.a X.b value 0 0 NULL 0 1 TRUE 1 0 FALSE 1 1 illegal 36 NULL 1,0 NULL 0,1 A B Y NULL Synchronization of Waves 0 0 0 A 0 1 0 1 0 0 Lecture "Advanced Digital Design" Y B 1 1 1 no glitch! & N 0 N 0 © A. Steininger / TU Vienna N successive „0“s clearly separable 37 * More Efficient Coding? NCL employs a 4-phase (RTZ) version of transition signaling. The „return to zero“ is due to the NULL waves. The NULL waves are unproductive and hence undesired. Can we employ a 2-phase (NRZ) signaling instead? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 38 NCL vs. Trans. Signaling Transition Signaling A=0 A=1 A=1 A=1 A=0 A0 A1 NULL-Convention Logic A=0 A=1 A=1 A=1 A=0 A0 A1 Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 39 Transition Signaling Protocol Source: [Sparso 06] high throughput difficult to implement: how to attain transition-based completion detection? Alternative: 2-phase protocol based on state signaling Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 40 Four-State Logic (FSL) Use 2 codes per logic value two-rail coding: X X.a X.b Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 41 FSL Flow Control Alternate code sets („phase“) NULL H NULL L NULL H NULL H TRUE h TRUE l FALSE l TRUE h NULL L TRUE l NULL L NULL H NULL H FALSE l TRUE h FALSE h consistent konsistent phase j1 phase j0 NCL FSL t Completion detection: Check whether all bits belong to the same phase Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 42 * FSL AND-Gate: Truth Table IN_2 IN_1 Y l h L H l l l * * h l h * * L * * L L H * * L H * … hold last valid output Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 43 Four-State Logic (FSL) An FSL gate holds its output until all inputs are in the same phase need storage capability (or feedback loop) even in combinational gate Lecture "Advanced Digital Design" X1.a X1.b X2.a X2.b © A. Steininger / TU Vienna X1 X2 Mem Mem Y Y.a Y.b 44 FSL and Code Conditions (C1) Phase change forces separation between successive data waves (C2) Coding scheme guarantees direct switch from one legal value in one phase to legal value in next phase (only one rail changes!) still this is different from transition signaling! Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 45 1,0 A B Y 0,1 Synchronization of Waves 0 0 0 A 0 1 0 1 0 0 0 0 F0 Lecture "Advanced Digital Design" Y B 1 1 1 no glitch! & F1 © A. Steininger / TU Vienna successive „0“s clearly separable 46 * FSL: A Brief Summary FSL retains all the charme of NCL FSL provides double data throughput implementation of 2-phase scheme requires more efforts => 4-phase is preferred for computation intensive tasks, and 2-phase for communication Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 47 FSL at a Glance dual-rail (level based) data coding 2-phase handshake X1.a a b X1.b … j0 LO 0 0 Xn.a HI 1 1 Xn.b j1 LO 0 1 Ack X1 … Xn HI 1 0 valid valid valid valid valid valid Ack Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 48 Gain of Delay Insensitive need to determine clock period circuit functionality is technology dependent considerable design efforts, large design loops need to make worst-case assumptions necessarily pessimistic no robustness wrt. exceeding them need to maintain global synchrony clock distribution problems power consumption problems Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 49 * Comparing the Styles data coding single rail multirail handshake style 4-phase 2-phase (RTZ) (NRZ) bundled data NCL FSL, LEDR single rail multirail delay model bounded QDI ACK explicit handshake REQ explicit Lecture "Advanced Digital Design" © A. Steininger / TU Vienna compl. det. 50 Test yourself… How is the issue condition enforced in DI logic? How is the capture condition enforced in DI logic? We still have an ACK line completion detection on code Why do we need the NULL wave in NCL then? It‘s the „zero“ of the RTZ protocol Otherwise the 1-of-2 coding style is not suitable for DI: condition (C1) violated Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 51 * The Role of Glitches synchronous logic bundled data unproblematic (temporal masking) only clock net is problematic unproblematic (temporal masking) control path (Mulle pipeline) is problematic QDI glitch may trigger completion detection glitch may upset memory state Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 52 * Efficiency vs. Assumptions timing assumptions make life easier (simpler design process, lower area,…) examples: „state“ abstraction in sync. design glitch insensitivity of sync design/BD single-rail coding in bundled data isochronic fork assumption in QDI however: watch out, assumptions compromize robustness! Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 53 Power saving by multirail one-of-n-coding in combination with 4-phase (RTZ) fulfills coding requirements two transitions per ld(n) bit code 0000 0001 0010 0100 1000 data(j,k) NULL 00 01 10 11 wider bus => fewer transitions trade area for power saving n-of-n and 1-of-n are the extremes; the whole solution space is k-of-n Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 54 The top 10 for async. consume power only when needed achieve average case performance high intrinsic robustness (PVT, faults,…) low EMI emission easy modular composition metastability has time to resolve avoid clock distribution problems exploit concurrence more gracefully intellectual challenge intrinsic elegance (global synchrony does not exist anyway) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 55 [Al Davis, Async’94] The truth … is, that just „going asynchronous“ is not beneficial, but in certain cases with carefully chosen method and implementation… simple syn=> asyn conversion does NOT do the job! a mix of different protocols & timing models is required tuning of library cells is beneficial … asynchronous design can have crucial advantages, in real industrial problems Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 56 Some Success Stories… CALTech Achronix world‘s fastest & most power-efficient FPGA Fulcrum Systems (now Intel) world‘s first asynchronous processor (1989) LUTONIUM: most power-efficient 8051 implem. fastest (240Gbit) Router on the market GHz SRAM ARM / Univ. Manchester ARM compatible core for Smartcards (SPA) Lecture "Advanced Digital Design " © A. Steininger / TU Vienna 57 Conclusion (1) The race condition for REQ can be avoided by appropriate data coding Null Convention Logic (NCL) implements a 4-phase version of this scheme Four-State Logic (FSL) implements the 2phase version Both NCL and FSL truly realize the closedloop timing control, yielding high timing robustness, thus the QDI model applies. The downside is that both techniques require storage cells even for combinational elements. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 58 Conclusion (2) When applied carefully, asynchronous design can exhibit considerable advantages over synchronous solutions with respect to energy consumption, speed, robustness, etc. The price is higher design complexity, lack of tools and libraries, and higher area So there is no general rule of when to prefer asynchronous solutions – it’s just an enhancement of the available design space. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 59
© Copyright 2026 Paperzz