De-synchronization:
from synchronous to asynchronous
Based on the paper:
Blunno, Cortadella, Kondratyev, Lavagno, Lwin, Sotiriou,
Handshake protocols for de-synchronization,
ASYNC 2004.
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

De-synchronize
Synchronous
CLK
Asynchronous
CLK
Synchronous circuit
MS flip-flop
L
0
L
1
L
0
CLK
0
L
0
L
L
1
De-synchronization
L
0
L
1
L
0
L
1
C
C
C
C
C
C
0
L
0
L
De-synchronization
Distributed controllers substitute the clock network
C
C
C
The data path remains intact !
C
C
C
Design flow

Think synchronous

Design synchronous:
one clock and edge-triggered flip-flops

De-synchronize (automatically)

Run it asynchronously
Prior work

Micropipelines (Sutherland, 1989)

Local generation of clocks
 Varshavsky et al., 1995
 Kol and Ginosar, 1996

Theseus Logic (Ligthart et al., 2000)
 Commercial HDL synthesis tools
 Direct translation and special registers

Phased logic (Linder and Harden, 1996)
(Reese, Thornton, Traver, 2003)
 Conceptually similar
 Different handshake protocol (2 phase vs. 4 phase)
Automatic de-synchronization

Devise an automatic method for
de-synchronization

Identify a subclass of synchronous circuits
suitable for de-synchronization

Formally prove correctness
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Synchronous flow
De-synchronized flow
+
Flow equivalence
[Guernic, Talpin, Lann, 2003]
A
B
Flow equivalence
CLK
A
B
1
5
3
1
A
B
1
5
1
0
2
1
5
2
3
1
4
Synchronous behavior
3
2
0
2
1
5
3
2
3
3
1
4
2
4
De-synchronized behavior
1
4
1
6
3
6
3
0
1
0
1
Flow equivalence
CLK
A
B
1
5
3
1
A
B
1
5
1
0
2
1
5
2
3
1
4
Synchronous behavior
3
2
0
2
1
3
2
5 3
3
1
4
2
4
De-synchronized behavior
1
4
1
6
3
6
3
0
1
0
1
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

L
0
L
1
L
0
L
1
C
C
C
C
C
C
0
L
0
L
C
C
C
C
C
C
L
C
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B1
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A1
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C0
D1
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B1
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
A
B
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Can we increase concurrency ?
A+
A+
B+
A-
B-
B+
A+
B+
A-
B-
not flow-equivalent
A-
B-
A
B
A
data overrun
B
A
data lost
B
A+
B+
A-
B-
Can we reduce concurrency ? How much ?
A+
B+
A-
B-
(8 states)
A+
B+
A+
B+
A-
B-
A+
B+
A-
B-
(6 states)
A-
B-
A+
B+
A-
B-
(5 states)
A+
B+
(4 states)
A-
B-
A
B
de-synchronization
model
A
A
B
B
fully decoupled
(Furber & Day)
GasP, IPCMOS
A
B
A
semi-decoupled
(Furber & Day)
A
B
B
simple 4-phase
non-overlapping
A+
A+
B+
ABfully decoupled
(Furber & Day)
simple 4-phase
A+
B+
A-
B-
B+
ABde-synchronization
model
A+
B+
ABsemi-decoupled
(Furber & Day)
A+
B+
ABGasP, IPCMOS
non-overlapping
A+
B+
A-
B-
4-phase latch controllers
Rin
Ain
Lt
Rout
Rin
Aout
Ain
Lt
Rout
Aout
Furber and Day, IEEE Trans. VLSI, June 1996
Implementation note: Lt=0 (transparent), Lt=1 (opaque)
4-phase latch controllers
Rin+
Rout+
Lt+
Ain+
Rin
Ain
Lt
Aout+
?
Rout
RinLt-
Rout-
Aout
Ain-
Aout-
4-phase latch controllers
Rin+
Ain+
Rin
Ain
Lt
Rout
Rout+
Lt+
Rin-
Aout+
Rout-
Aout
Ain-
Simple 4-phase controller
Lt-
Aout-
4-phase latch controllers
Rin+
Ain+
Rout+
Lt+
Rin-
Ain-
Simple 4-phase controller
Aout+
Rout-
Lt-
Aout-
4-phase latch controllers
Rin
Ain
Lt
Rout
Rin+
A+
Rout+
Ain+
Lt+
Aout+
Rin-
A-
Rout-
Ain-
Lt-
Aout-
Aout
Semi-decoupled controller
4-phase latch controllers
Rin+
A+
Rout+
Ain+
Lt+
Aout+
Rin-
A-
Rout-
Ain-
Lt-
Aout-
Semi-decoupled controller
4-phase latch controllers
Rin+
A+
Rout+
Ain+
Lt+
Aout+
B+
Rin
Ain
Lt
Rout
Rin-
A-
Rout-
Ain-
Lt-
Aout-
Aout
B-
Fully decoupled controller
4-phase latch controllers
Rin+
A+
Rout+
Ain+
Lt+
Aout+
B+
Rin-
A-
Rout-
Ain-
Lt-
Aout-
B-
Fully decoupled controller
4-phase latch controllers (state graphs)
Semi-decoupled controller
Fully decoupled controller
A
Ri
Ai
Ri+
cntrl
A-
Ai+
Ri-
Ai-
Rx
Ax
Rx+
B
Ro
cntrl
Ao
B-
Ax+
A+
Rx-
Ro+
Ao+
B+
Ro-
AxAo(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Which protocols are valid
for de-synchronization ?
A+
B+
A-
B-
Theorem:
the de-synchronization protocol
preserves flow-equivalence
Proof: by induction on the length of the traces
Induction hypothesis: same latch values at reset
Induction step:
same values at cycle i  same values at cycle i+1
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Theorem:
any reduction in concurrency preserves flow-equivalence
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Any hybrid approach preserves
flow-equivalence !
Semidecoupled
Fully
decoupled
Semidecoupled
nonnonoverlapping overlapping
Fully
decoupled
Semidecoupled
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
semidecoupled
nonoverlapping
fully
decoupled
Flow-equivalence is preserved, … but …
Liveness

Preservation of flow-equivalence:
all the generated traces are equivalent

Are all traces generated ?
(Is the marked graph live ?)
Not always !
A+
B+
C+
D+
A-
B-
C-
D-
Semi-decoupled 4-phase handshake protocol
Liveness: all cycles have at least one token [Commoner 1971]
A+
B+
C+
D+
A-
B-
C-
D-
Simple 4-phase handshake protocol
Results about liveness

At least three latches in a ring are required with
only one data token circulating
[Muller 1962]

Theorem (this paper):
any hybrid combination of protocols is live if the
simple 4-phase protocol is not used
Proof: any cycle has at least one token
A+
A+
B+
ABfully decoupled
(Furber & Day)
simple 4-phase
A+
B+
A-
B-
B+
ABde-synchronization
model
A+
B+
ABsemi-decoupled
(Furber & Day)
A+
B+
ABGasP, IPCMOS
non-overlapping
A+
B+
A-
B-
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Async DLX block diagram
Synchronous RTL
Synchronous
Desynchronized
=
Cycle: 4.4ns
Power: 70.9mW
Area: 372,656m





Cycle: 4.45ns
Power: 71.2mW
Area: 378,058m
All numbers are after Placement & Routing
Total of 1500 flip-flops, 3000 latches
DE-SYNC design includes 5 controllers, each driving 2 clock trees
Power numbers include the clock tree
Technology: UCM/Virtual Silicon 0.18 µm
Discussion

The de-synchronization model provides an
abstraction of the timing behavior
[5,7]
A
[2,3]
[0,0]
B
[3,5]
E
[3,5]
D
F
C
[1,2]
[8,9]
[2,4]
• Timing analysis
• Exploration of the design space
G
Conclusions



EDA tools require a formal support
(they must work for all circuits)
A complete characterization of 4-phase protocols
has been presented
(partial order based on concurrency)
Design flow developed at Cadence Berkeley Labs
 Automated from gate netlist
 Static timing analysis to derive matched delays
 Constrained P&R to meet timing constraints