Delay Insensitive Methods

Advanced Digital Design
Asynchronous Design: DI Methods
A. Steininger
Vienna University of Technology
Outline

Delay Insensitive design - principle

NULL-Convention Logic

Code conditions for DI logic

Four-State Logic
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
2
Asynchronous Philosophy
recall



„The control flow requires agreement
between source and sink. For this
purpose they need to communicate“
Source indicates capture condition
for sink.
Sink indicates issue condition
for source.
„HANDSHAKE“
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
3
recall
Handshake Principle
REQ: „Data word valid, you can use it“
When can SNK use its input?
When it is valid and consistent
SRC
f(x)
SNK
When can SRC apply the next input?
When SNK has consumed the previous one
ACK: „Data word consumed, send the next“
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
4*
A very Important Detail
recall

The handshake establishes a
closed-loop control for the data flow
between sender and receiver

This makes operation more robust than in
the synchronous (= open-loop) case

The art of asynchronous design is to
make many of these closed loops
interoperate properly

This is much more complicated than a
synchronous design.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
5
Bundled Data at a Glance


single-rail data coding
4- or 2-phase handshake
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
Source: [Sparso 06]
6
Very disappointing…


For a closed loop we need to
measure the quantity of interest
So far we have not done that:




We have not measured validity &
consistency
We have used time as an
indirect measure instead
Thus Bounded Delay methods do not
provide the benefits of a closed loop
BUT: Can we measure validity &
consistency at all?
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
7
Criticality of ACK
recall
SRC
f(x)
SNK
L2
„capture!“




cannot measure „act of capturing“ as an event
use latching command instead
fork produces race between trigger process
and next data wave
race is uncritical (but still exists!)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
8*
Criticality of REQ
recall
SRC




f(x)
SNK
cannot use issue trigger as an event:
produces unacceptable race between
data and REQ
must introduce timer (bounded delay)
OR: find better event (downstream)
completion detection
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
9*
Completion Detection






In order to judge when data are valid &
consistent we need to be able to see
when this is NOT the case
not possible with Boolean logic
need representation for INVALID
an ACK in parallel to data (bundled
data) will always cause a race
need more than two signal states for
every individual bit (!)
need more than one rail per bit
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
10
Multi-level Logic

use more than two (e.g. three) voltage
levels per rail

allows to express „invalid“ in the currently
„forbidden“ area between HI and „LO“

requires two thresholds for every gate
input

output must be able to drive three
different levels reliably

causes substantial technological problems
not further pursued
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
11
Our Options
recall


We must only use consistent input
vectors
How can we tell an input vector is
consistent?
(1) use TIME to mark consistent phases


synchronous approach / global time base
asynchronous/bounded delay
(2) use CODING to add information

asynchronous/delay insensitive
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
12 *
Terminology
recall
consistent DW: all bits belong to the same context
valid signal: result of function applied to consistent DW
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
13
NULL Convention Logic
Add the value NULL to the alphabet
two-rail coding:
Signal X
X.a X.b meaning
0
0
NULL (N)
0
1
TRUE (T)
1
0
FALSE (F)
1
1
illegal
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
X
X.a
X.b
„DATA“
14
NCL Functions
AND T F N
T T F N
OR T F N
T T T N
NOT
T F
F F F N
N N N N
F T F N
N N N N
F T
N N
naive approach:
if any input is „N“ then output „N“
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
15
NCL Flow Control

NULL waves enframe DATA waves
NULL
NULL
NULL
NULL
TRUE
NULL
TRUE
FALSE
TRUE
NULL
NULL
NULL
TRUE
FALSE
TRUE
FALSE
consistent DATA

t
Completion detection =
check wether all bits are „DATA“
(completeness of DATA)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
16 *
Still Problems …
What about this situation?
output
NULL
NULL
DATA
TRUE
NULL
NULL
NULL
NULL
NULL
DATA
TRUE
TRUE
FALSE
TRUE
NULL
NULL
consistent DATA
TRUE
FALSE
t
Fast bits may catch up with a slow bit from
the previous word. The word containing
the „old“ bit is considered consistent!
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
17 *
Solution Principle

Enforce „completeness of NULL“ as
well:


The output must not go to NULL before all
inputs have changed to NULL
In a closed loop configuration this keeps
the slow paths in synchrony with the fast
ones
We need different truth table when
output is NULL
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
18
Two Truth Tables
for DATA waves



for NULL waves
AND T F N
T T F N
AND T F N
T T F D
F F F N
N N N N
F F F D
N D D N
D … DATA
(T or F)
must hold output in last valid state before
new input is complete
need „hysteresis“
need to consider current output in truth table
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
19
Feedback Gate
T
A N
F
B N
N
F
Y
Y
A N
B NT F N
Y‘ N N N N
N
T
&
F
F
N
T
F
N
Y‘
T
F
N
T
F
T F N T F N T F N T F N T F N T F N T F N T F
T F N F F N F F F F F F F F N T T T T T T T T
unstable (Y  Y‘)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
20 *
No more Problems …
Have we solved the problem?
output
NULL
NULL
DATA
TRUE
NULL
NULL
NULL
NULL
TRUE
FALSE
TRUE
NULL
NULL
t
consistent DATA
YES! The output now remains at DATA with the
slowest bit, thus inhibiting (via the closed loop)
the fast bits to convey the next DATA wave.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
21 *
NCL Gates
The desired hysteresis requires an NCL
gate to hold its output until


all inputs are DATA or
all inputs are NULL
need storage
capability (or
feedback loop)
even in combinational gate
Lecture "Advanced Digital Design"
X1.a
X1
X1.b
X2.a
X2.b
© A. Steininger / TU Vienna
X2
Mem
Mem
Y
Y.a
Y.b
22
NCL Gate Implementation
p- and n-stack
not dual
figure shown for
one output rail only
X1.a
X1
X1.b
X2.a
X2
X2.b
Mem
Mem
Y
Y.a
Y.b
CMOS-Transistors only
but no standard cells
Lecture "Advanced Digital Design"
memory cell
at output
[G. Sobelmann, K. Fant: CMOS Circuit Design
of Threshold Gates with Hysteresis]
© A. Steininger / TU Vienna
23 *
The Charme of NCL

self-regulating data flow





in a NULL initialized circuit a DATA front
will propagate towards the output
alternating waves of NULL and DATA
pace the data flow (which, in some
sense, forms the „clock“)
based on direct assessment of
validity & consistency
no delay assumptions necessary
(ideally), no „worst case“, …
globally applicable solution
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
24
Validity and Consistency

Consistency (multiple bits @ input)


all bits that are combined are valid
and belong to the same context
Validity (single bit @ output)

the bit is the stable result of a
combination of consistent bits
Consistency implies validity (per
definition) but NOT vice versa!
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
25 *
Val. & Consistcy. in NCL

Validity:




output is changed only when consistent
input is available („hold“ in truth table)
coding ensures direct transistion from
valid code to another (NULL is valid but
spacer only)
continuous validity
Consistency:


NULL spacer between DATA waves
allows identification of context
synchronization of context by virtue of
„completeness of NULL“ condition
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
26 *
What about sync. & BD?

Timing ensures that every data item is
both valid and consistent at the time it is
used:


choice of clock period (sync)
choice of delay values (BD)

In contrast to NCL (temporary) invalidity &
inconsistency of data is admitted.

No explicit measures (other than timing)
are taken/necessary to cope with these
issues.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
27
Delay Models
recall

synchronous model


bounded delay model (BD, fundamental)


bounds for relative deviation between delays known
quasi-delay-insensitive (QDI)


known bounds for absolute delays, local timing
scalable-delay-insensitive model (SDI)


known bounds for delays, global timing
output paths of a fork have same delay
delay insensitive (DI)

no restrictions on delays (just finite)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
28
NCL: A Brief Summary

validity & consistency directly visible

no timing assumptions required (ideally)

„delay insensitive“ (ideally)

suitable for CMOS implementation

coding of one bit on two rails

2 memory cells per combinational output

efficiency: 50% of the data flow are
unproductive NULL waves

patented und industrially used
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
29
NCL at a Glance


dual-rail data coding
4-phase handshake
Source: [Sparso 06]
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
30
Our Options
recall


We must only use consistent input
vectors
How can we tell an input vector is
consistent?
(1) use TIME to mark consistent phases


synchronous approach / global time base
asynchronous/bounded delay
(2) use CODING to add information

asynchronous/delay insensitive
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
31 *
Conditions for DI Coding
completion detection needs ONE UNIQUE event:
(C1) Identification of every context switch
It must be possible to clearly separate two
successive data words under all
circumstances =>prohibit having no event
(C2) Unique context membership
The transition from one valid code word to
the next must be unambiguous, i.e. no
intermediate state may be a valid code
=>prohibit having more than one event
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
32 *
Conditions for DI coding
(C1) Identification of every context switch
It must be possible to clearly separate two
successive data words under all
circumstances
0,0,0
Lecture "Advanced Digital Design"
?
0,0,0
© A. Steininger / TU Vienna
33 *
Conditions for DI coding
(C1) Identification of every context switch
It must be possible to clearly separate two
successive data words under all
circumstances
(C2) Unique context membership
The transition from one valid code word to
the next must be unambiguous, i.e. no
intermediate state may be a valid code
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
34 *
Conditions for DI coding
0,0,0
1,0,0
1,0,1
1,1,1
?
(C2) Unique context membership
The transition from one valid code word to
the next must be unambiguous, i.e. no
intermediate state may be a valid code
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
35 *
What about NCL‘s Coding

(C1) Return to NULL forces separation
between successive data waves

(C2) Coding scheme
guarantees direct
switch from one
legal value to next
(only one rail changes!)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
Signal X
X.a
X.b
value
0
0
NULL
0
1
TRUE
1
0
FALSE
1
1
illegal
36
NULL
1,0
NULL
0,1
A B Y
NULL
Synchronization of Waves
0 0 0
A
0 1 0
1 0 0
Lecture "Advanced Digital Design"
Y
B
1 1 1
no glitch!
&
N
0 N 0
© A. Steininger / TU Vienna
N
successive
„0“s clearly
separable
37 *
More Efficient Coding?




NCL employs a 4-phase (RTZ)
version of transition signaling.
The „return to zero“ is due to the
NULL waves.
The NULL waves are unproductive
and hence undesired.
Can we employ a 2-phase (NRZ)
signaling instead?
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
38
NCL vs. Trans. Signaling

Transition Signaling
A=0
A=1 A=1
A=1 A=0
A0
A1

NULL-Convention Logic
A=0
A=1 A=1
A=1 A=0
A0
A1
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
39
Transition Signaling Protocol
Source: [Sparso 06]


high throughput
difficult to implement: how to attain
transition-based completion detection?
Alternative: 2-phase protocol based on state signaling
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
40
Four-State Logic (FSL)

Use 2 codes per logic value
two-rail coding:
X
X.a
X.b
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
41
FSL Flow Control

Alternate code sets („phase“)
NULL
H
NULL
L
NULL
H
NULL
H
TRUE
h
TRUE
l
FALSE
l
TRUE
h
NULL
L
TRUE
l
NULL
L
NULL
H
NULL
H
FALSE
l
TRUE
h
FALSE
h
consistent konsistent
phase j1 phase j0

NCL
FSL
t
Completion detection: Check whether
all bits belong to the same phase
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
42 *
FSL AND-Gate: Truth Table
IN_2
IN_1
Y
l
h
L
H
l
l
l
*
*
h
l
h
*
*
L
*
*
L
L
H
*
*
L
H
* … hold last valid output
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
43
Four-State Logic (FSL)
 An FSL gate holds its output until
all inputs are in the same phase
need storage
capability (or
feedback loop)
even in combinational gate
Lecture "Advanced Digital Design"
X1.a
X1.b
X2.a
X2.b
© A. Steininger / TU Vienna
X1
X2
Mem
Mem
Y
Y.a
Y.b
44
FSL and Code Conditions

(C1) Phase change forces separation
between successive data waves

(C2) Coding scheme
guarantees direct
switch from one
legal value in one
phase to legal value
in next phase
(only one rail changes!)
still this is different from transition signaling!
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
45
1,0
A B Y
0,1
Synchronization of Waves
0 0 0
A
0 1 0
1 0 0
0
0
F0
Lecture "Advanced Digital Design"
Y
B
1 1 1
no glitch!
&
F1
© A. Steininger / TU Vienna
successive
„0“s clearly
separable
46 *
FSL: A Brief Summary
 FSL retains all the charme of NCL
 FSL provides double data throughput
 implementation of 2-phase scheme
requires more efforts
=> 4-phase is preferred for computation
intensive tasks,
and 2-phase for communication
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
47
FSL at a Glance


dual-rail (level based) data coding
2-phase handshake
X1.a
a b
X1.b
…
j0 LO 0 0
Xn.a
HI 1 1
Xn.b
j1 LO 0 1
Ack
X1
…
Xn
HI 1 0
valid
valid
valid
valid
valid
valid
Ack
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
48
Gain of Delay Insensitive






need to determine clock period


circuit functionality is technology dependent
considerable design efforts, large design
loops
need to make worst-case assumptions


necessarily pessimistic
no robustness wrt. exceeding them
need to maintain global synchrony


clock distribution problems
power consumption problems
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
49 *
Comparing the Styles
data
coding
single rail
multirail
handshake style
4-phase
2-phase
(RTZ)
(NRZ)
bundled data
NCL
FSL, LEDR
single rail
multirail
delay model
bounded
QDI
ACK
explicit handshake
REQ
explicit
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
compl. det.
50
Test yourself…

How is the issue condition enforced
in DI logic?


How is the capture condition
enforced in DI logic?


We still have an ACK line
completion detection on code
Why do we need the NULL wave in
NCL then?


It‘s the „zero“ of the RTZ protocol
Otherwise the 1-of-2 coding style is not
suitable for DI: condition (C1) violated
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
51 *
The Role of Glitches

synchronous logic



bundled data



unproblematic (temporal masking)
only clock net is problematic
unproblematic (temporal masking)
control path (Mulle pipeline) is
problematic
QDI


glitch may trigger completion detection
glitch may upset memory state
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
52 *
Efficiency vs. Assumptions

timing assumptions make life easier
(simpler design process, lower area,…)

examples:





„state“ abstraction in sync. design
glitch insensitivity of sync design/BD
single-rail coding in bundled data
isochronic fork assumption in QDI
however: watch out, assumptions
compromize robustness!
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
53
Power saving by multirail

one-of-n-coding


in combination with 4-phase (RTZ)
fulfills coding requirements
two transitions per ld(n) bit
code
0000 0001 0010 0100 1000
data(j,k) NULL


00
01
10
11
wider bus => fewer transitions
trade area for power saving
n-of-n and 1-of-n are the extremes;
the whole solution space is k-of-n
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
54
The top 10 for async.











consume power only when needed
achieve average case performance
high intrinsic robustness (PVT, faults,…)
low EMI emission
easy modular composition
metastability has time to resolve
avoid clock distribution problems
exploit concurrence more gracefully
intellectual challenge
intrinsic elegance
(global synchrony does not exist anyway)
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
55
[Al Davis, Async’94]
The truth …



is, that just „going asynchronous“ is not
beneficial, but
in certain cases
with carefully chosen method and
implementation…




simple syn=> asyn conversion
does NOT do the job!
a mix of different protocols & timing models is
required
tuning of library cells is beneficial
… asynchronous design can have crucial
advantages, in real industrial problems
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
56
Some Success Stories…

CALTech



Achronix


world‘s fastest & most power-efficient FPGA
Fulcrum Systems (now Intel)



world‘s first asynchronous processor (1989)
LUTONIUM: most power-efficient 8051 implem.
fastest (240Gbit) Router on the market
GHz SRAM
ARM / Univ. Manchester

ARM compatible core for Smartcards (SPA)
Lecture "Advanced Digital Design
"
© A. Steininger / TU Vienna
57
Conclusion (1)





The race condition for REQ can be avoided
by appropriate data coding
Null Convention Logic (NCL) implements a
4-phase version of this scheme
Four-State Logic (FSL) implements the 2phase version
Both NCL and FSL truly realize the closedloop timing control, yielding high timing
robustness, thus the QDI model applies.
The downside is that both techniques
require storage cells even for
combinational elements.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
58
Conclusion (2)



When applied carefully, asynchronous
design can exhibit considerable advantages
over synchronous solutions with respect to
energy consumption, speed, robustness,
etc.
The price is higher design complexity, lack
of tools and libraries, and higher area
So there is no general rule of when to
prefer asynchronous solutions – it’s just an
enhancement of the available design
space.
Lecture "Advanced Digital Design"
© A. Steininger / TU Vienna
59