R(v)

Circuit Retiming with
Interconnect Delay
CUHK CSE CAD Group Meeting One
Evangeline Young
Aug 19, 2003
Circuit Retiming
 Given a circuit, we want to relocate the
registers to achieve a better clock period.
Clock period
= 3 units
Clock period
= 2 units
Retiming
Registers
Circuit Retiming
 In order to maintain the functionality of the
circuit, registers can only be moved in certain
ways:
Retiming
Circuit Retiming
 Given a circuit, how should we place the
registers to minimize the clock period?
Traditional Approach
 This retiming problem is firstly introduced in
the following classical paper:

“Retiming Synchronous Circuitry”, Charles
E. Leiserson and James B. Saxe,
Algorithmica, 6:5-35, 1991
 Only gate delay was considered.
 Three methods are proposed. One of them
solves the problem by mixed integer linear
programming (MILP).
Traditional Approach
 Notations:





d(v) is the delay of node v.
w(e) is the original no. of registers on edge e.
c is the clock period that we want to check if it
is feasible.
r(v) is the retiming value of node v, i.e., the no.
of registers moved from the output to the input
of node v. (r(v) is what we want to find.)
s(v) is the longest delay from a register
connected directly to node v to the output of v.
Traditional Approach
 More about s(v)…
A
s(v) is the delay from point A to B,
including the delay of v.
B
v
Traditional Approach
 Integer Linear Program:




d(v)  s(v)
s(v)  c
r(u) r(v)  w(e)
s(u) – s(v)  -d(v)
for all node v
for all node v
for all edge e(u,v)
wherever e(u,v) s.t.
r(u) - r(v) = w(e)
(1)
(2)
(3)
(4)
Traditional Approach
 Write R(v) as r(v) + s(v)/c
 The ILP can be written as an MILP:




r(v) – R(v)  -d(v)/c
R(v) – r(v)  1
r(u) r(v)  w(e)
R(u) – R(v)  w(e)-d(v)/c
for all node v
for all node v
for all edge e(u,v)
for all edge e(u,v)
(1)
(2)
(3)
(4)
 The above set of difference constraints can
be solved in polynomial time, though it
consists of both integer and real variables.
Traditional Approach
 Use binary search to find the optimal clock:



T0 = 0
T1 = e10 // a large no.
Repeat




c = (T0 + T1)/2
Check if c is a feasible clock period by solving the
MILP.
If success, T1 = c; otherwise, T0 = c.
Until success and (T1 - T0)/T1 < ε
Retiming with Interconnect Delay
 We consider clock period minimization.
 Retiming has been studied and applied
extensively at logic synthesis.
 However, most previous retiming algorithms
ignore interconnect delay.
 Interconnect delay should be considered for
high performance circuits in DSM design.
 This solution is going to be presented in the
upcoming ICCAD 2003.
Retiming with Interconnect Delay
 We assume that wire delay is directly
proportional to its length.
 This assumption is reasonable:


For short wires, the quadratic component of a
wire delay is significantly smaller than its linear
component.
For long wires, buffer insertion can be done.
Retiming with Interconnect Delay
Retiming with Interconnect Delay
 Now, a retiming solution needs to specify:


the retiming label r(v) for each node v.
the positions of the registers on each edge.
r( ) = 0
Retiming
r( ) = 0
The positions of the registers
on the edges are important as
there are interconnect delay.
r( ) = -1
r( ) = 0
Our Contributions
 Optimal algorithm:

O(|V||E| log |V| + |V|2 log2|V|) time per iteration.
 Near-optimal algorithm:



Only 0.13% larger than the optimal on average.
O(|Vb||E| + |Vb||Eh|) time per iteration, e.g., a
circuit with 16.1K gates and 28.6K wires can
be retimed in 44.32s by a 1.8GHz PIII PC.
Based on an optimal algorithm handling
interconnect delay only, i.e., no gate delay.
Optimal Approach
 Rewrite the ILP on p.8 as follows:




d(v)  s(v)
for all node v
s(v)  c
for all node v
r(u) r(v)  w(e)
for all edge e(u,v)
s(v) ≥ s(u) + d(e) + d(v) - c(r(v) - r(u) + w(e))
for all edge e(u,v)
(1)
(2)
(3)
(4)
Optimal Approach
 Similarly, write R(v) as r(v) + s(v)/c:




r(v) – R(v)  -d(v)/c
for all node v
R(v) – r(v)  1
for all node v
r(u) r(v)  w(e)
for all edge e(u,v)
R(u) – R(v)  w(e) - d(v)/c - d(e)/c
for all edge e(u,v)
(1)
(2)
(3)
(4)
 Again, the above set of constraints can be
solved in polynomial time, though the runtime
is quite long.
Optimal Approach
Circuit
s1488
s1494
s3271
s3330
s3384
s4863
s5378
s6669
s9234
s13207
s15850
s35932
s38417
s38584
|V|
|E|
copt
655
649
1574
1791
1687
2344
2781
3082
5599
7953
9774
16067
22181
19255
1405
1411
2707
2890
2782
4093
4261
5399
8005
11302
13794
28590
32135
33010
18.85
20.78
10.24
27.05
24.16
23.58
27.25
22.96
42.73
72.34
67.82
29.54
36.52
Runtime (s)
5.62
4.37
33.70
43.14
25.19
87.75
138.68
177.59
512.86
1161.07
1545.59
8644.27
7680.79
>15000
Near Optimal Approach
 Transform the original graph G by splitting
each node v (represents a gate) into a pair of
nodes v1 and v2 connected by an edge with
delay d(v).
v
v1
delay = d(v)
delay = 0
delay = d(v)
v2
Near Optimal Approach
 After representing each gate by a wire, we
can find an optimal retiming solution S for the
transformed circuit G1. (We will show how to
find the optimal solution with no gate delay.)
 The clock period of S will be a lower bound L
for the optimal solution Topt of G.
 From S, we can obtain a feasible retiming
solution for the original circuit G.
Near Optimal Approach
 Registers retimed into a wire representing a
gate v will be moved backward to the input
edges or forward to the output edges
depending on their distances from v1 and v2.
v1
v2
 Linear programming is used to determine the
positions of the registers on each edge after
this relocation step to minimize the clock
period considering both gate and wire delay.
Near Optimal Approach
 It is now the problem of solving the retiming
problem optimally assuming that gate delay is
zero.
 When there is no gate delay, the set of
constraints on p.17 becomes:




r(v) – R(v)  0
R(v) – r(v)  1
r(u) r(v)  w(e)
R(u) – R(v)  w(e)-d(e)/c
for all node v
for all node v
for all edge e(u,v)
for all edge e(u,v)
(1)
(2)
(3)
(4)
Near Optimal Approach
 Lemma 1: Given R(v) for all node v that
satisfy constraint (4), we can obtain a solution
to constraint (1)-(4) by setting
r(v) = trunc(R(v))
 Given Lemma 1, we only need to solve
constraint (4): R(u) – R(v)  w(e)-d(e)/c for all
edge e(u,v).
 Consider the input graph G(V,E) such that the
weight of each edge e(u,v) is -w(e)+d(e)/c.
Near Optimal Approach
 There is a solution to constraint (4) iff G has
no positive cycles.
 Positive cycle detection in G can be achieved
by positive cycle detection in a smaller graph
H(Vb,Eh) constructed from G. This technique
can be applied in other positive cycle detection
problems, not necessarily in circuit retiming.
 After solving R(v), we can find r(v) and s(v) for
all node v.
Near Optimal Approach
 After the binary search, we can find the
optimal clock and the corresponding r(v) and
s(v) for all node v. Then, we can place the
registers accordingly:
Assume that r(v)-r(u)+w(e) = 4
u
v
c - s(u)
c
Other registers are placed
right in front of v.
Near Optimal Approach
 First, assuming that gate delay is zero.
 Binary search to find the minimum feasible
clock period c
 To test the feasibility of a fixed c:


Transforming to a positive cycle detection
problem on a reduced graph
Can be solved by a single-source longest-path
algorithm
Results
Circuit
s1488
s1494
s3271
s3330
s3384
s4863
s5378
s6669
s9234
s13207
s15850
s35932
s38417
s38584
copt
18.82
20.78
10.24
27.05
24.16
23.58
27.25
22.96
42.73
72.34
67.82
29.54
36.52
cnear opt
18.85
20.78
10.24
27.05
24.21
23.58
27.27
23.07
42.73
72.34
67.82
29.59
36.53
94.26
Topt (s)
5.62
4.37
33.70
43.14
25.19
87.75
138.68
177.59
512.86
1161.07
1545.59
8644.27
7680.79
>15000
Tnear opt (s)
0.28
0.25
1.09
0.50
0.74
3.12
1.16
1.91
4.08
8.11
24.02
61.25
83.56
445.63
Future Directions
 Consider a more accurate modeling for the
interconnect delay, e.g., use Elmore delay.
 How to map the retiming solution to the
floorplanning or placement solution?
Registers are large and take up silicon
resources.
 How to consider fan-out capacitance with
interconnect delay?