Handshake Protocols for De-Synchronization - FORTH-ICS

Handshake protocols
for de-synchronization
I. Blunno, J. Cortadella, A. Kondratyev,
L. Lavagno, K. Lwin and C. Sotiriou
Politecnico di Torino, Italy
Universitat Politecnica de Catalunya, Barcelona, Spain
Cadence Berkeley Lab, Berkeley, USA
ICS-FORTH, Crete, Greece
Asynchronous
for dummies
I. Blunno, J. Cortadella, A. Kondratyev,
L. Lavagno, K. Lwin and C. Sotiriou
Politecnico di Torino, Italy
Universitat Politecnica de Catalunya, Barcelona, Spain
Cadence Berkeley Lab, Berkeley, USA
ICS-FORTH, Crete, Greece
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

De-synchronize
Synchronous
CLK
Asynchronous
CLK
Synchronous circuit
MS flip-flop
L
0
L
1
L
0
CLK
0
L
0
L
L
1
De-synchronization
L
0
L
1
L
0
L
1
C
C
C
C
C
C
0
L
0
L
De-synchronization
Distributed controllers substitute the clock network
C
C
C
The data path remains intact !
C
C
C
Design flow

Think synchronous

Design synchronous:
one clock and edge-triggered flip-flops

De-synchronize (automatically)

Run it asynchronously
Prior work

Micropipelines (Sutherland, 1989)

Local generation of clocks
 Varshavsky et al., 1995
 Kol and Ginosar, 1996

Theseus Logic (Ligthart et al., 2000)
 Commercial HDL synthesis tools
 Direct translation and special registers

Phased logic (Linder and Harden, 1996)
(Reese, Thornton, Traver, 2003)
 Conceptually similar
 Different handshake protocol (2 phase vs. 4 phase)
Automatic de-synchronization

Devise an automatic method for
de-synchronization

Identify a subclass of synchronous circuits
suitable for de-synchronization

Formally prove correctness
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Synchronous flow
De-synchronized flow
+
Flow equivalence
[Guernic, Talpin, Lann, 2003]
A
B
Flow equivalence
CLK
A
B
1
5
3
1
A
B
1
5
1
0
2
1
5
2
3
1
4
Synchronous behavior
3
2
0
2
1
5
3
2
3
3
1
4
2
4
De-synchronized behavior
1
4
1
6
3
6
3
0
1
0
1
Flow equivalence
CLK
A
B
1
5
3
1
A
B
1
5
1
0
2
1
5
2
3
1
4
Synchronous behavior
3
2
0
2
1
3
2
5 3
3
1
4
2
4
De-synchronized behavior
1
4
1
6
3
6
3
0
1
0
1
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

L
0
L
1
L
0
L
1
C
C
C
C
C
C
0
L
0
L
C
C
C
C
C
C
L
C
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B1
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A1
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot read another data item until
the successor has captured the current one
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C0
D1
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A0
B0
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B1
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C1
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A latch cannot become opaque before having
captured the data item from its predecessor
A0
B0
C0
D0
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A-
B+
C-
D+
A+
B-
C+
D-
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
A
B
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Can we increase concurrency ?
A+
A+
B+
A-
B-
B+
A+
B+
A-
B-
not flow-equivalent
A-
B-
A
B
A
data overrun
B
A
data lost
B
A+
B+
A-
B-
Can we reduce concurrency ? How much ?
A+
B+
A-
B-
(8 states)
A+
B+
A+
B+
A-
B-
A+
B+
A-
B-
(6 states)
A-
B-
A+
B+
A-
B-
(5 states)
A+
B+
(4 states)
A-
B-
A
B
de-synchronization
model
A
A
B
B
fully decoupled
(Furber & Day)
GasP, IPCMOS
A
B
A
semi-decoupled
(Furber & Day)
A
B
B
simple 4-phase
non-overlapping
A+
A+
B+
ABfully decoupled
(Furber & Day)
simple 4-phase
A+
B+
A-
B-
B+
ABde-synchronization
model
A+
B+
ABsemi-decoupled
(Furber & Day)
A+
B+
ABGasP, IPCMOS
non-overlapping
A+
B+
A-
B-
A
Ri
Ai
Ri+
cntrl
A-
Ai+
Ri-
Ai-
Rx
Ax
Rx+
B
Ro
cntrl
Ao
B-
Ax+
A+
Rx-
Ro+
Ao+
B+
Ro-
AxAo(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A
Ri
Ai
cntrl
Rx
Ax
B
Ro
cntrl
Ao
A-
B-
A+
B+
(semi-decoupled 4-phase protocol)
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Which protocols are valid
for de-synchronization ?
A+
B+
A-
B-
Theorem:
the de-synchronization protocol
preserves flow-equivalence
Proof: by induction on the length of the traces
Induction hypothesis: same latch values at reset
Induction step:
same values at cycle i  same values at cycle i+1
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Theorem:
any reduction in concurrency preserves flow-equivalence
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
A+
B+
A-
B-
Any hybrid approach preserves
flow-equivalence !
Semidecoupled
Fully
decoupled
Semidecoupled
nonnonoverlapping overlapping
Fully
decoupled
Semidecoupled
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
A
B
C
D
A+
B+
C+
D+
A-
B-
C-
D-
semidecoupled
nonoverlapping
fully
decoupled
Flow-equivalence is preserved, … but …
Liveness

Preservation of flow-equivalence:
all the generated traces are equivalent

Are all traces generated ?
(Is the marked graph live ?)
Not always !
A+
B+
C+
D+
A-
B-
C-
D-
Semi-decoupled 4-phase handshake protocol
Liveness: all cycles have at least one token [Commoner 1971]
A+
B+
C+
D+
A-
B-
C-
D-
Simple 4-phase handshake protocol
Results about liveness

At least three latches in a ring are required with
only one data token circulating
[Muller 1962]

Theorem (this paper):
any hybrid combination of protocols is live if the
simple 4-phase protocol is not used
Proof: any cycle has at least one token
A+
A+
B+
ABfully decoupled
(Furber & Day)
simple 4-phase
A+
B+
A-
B-
B+
ABde-synchronization
model
A+
B+
ABsemi-decoupled
(Furber & Day)
A+
B+
ABGasP, IPCMOS
non-overlapping
A+
B+
A-
B-
Outline
What is de-synchronization ?
 Behavioral equivalence
 4-phase protocols for de-synchronization
 Concurrency
 Correctness
 An example

Async DLX block diagram
Synchronous RTL
Synchronous
Desynchronized
=
Cycle: 4.4ns
Power: 70.9mW
Area: 372,656m





Cycle: 4.45ns
Power: 71.2mW
Area: 378,058m
All numbers are after Placement & Routing
Total of 1500 flip-flops, 3000 latches
DE-SYNC design includes 5 controllers, each driving 2 clock trees
Power numbers include the clock tree
Technology: UCM/Virtual Silicon 0.18 µm
De-synchronized DLX on FPGA
(demo outside the conference room)
Discussion

The de-synchronization model provides an
abstraction of the timing behavior
[5,7]
A
[2,3]
[0,0]
B
[3,5]
E
[3,5]
D
F
C
[1,2]
[8,9]
[2,4]
• Timing analysis
• Exploration of the design space
G
A+
A+
B+
ABfully decoupled
(Furber & Day)
simple 4-phase
A+
B+
A-
B-
B+
ABde-synchronization
model
A+
B+
ABsemi-decoupled
(Furber & Day)
A+
B+
ABGasP, IPCMOS
non-overlapping
A+
B+
A-
B-
Conclusions



EDA tools require a formal support
(they must work for all circuits)
A complete characterization of 4-phase protocols
has been presented
(partial order based on concurrency)
Design flow developed at Cadence Berkeley Labs
 Automated from gate netlist
 Static timing analysis to derive matched delays
 Constrained P&R to meet timing constraints