Pipelining and
Retiming
Prepared by Mark Jarvin
Agenda
Synchronous circuit retiming
Pipelining
Software pipelining
The Retiming Problem: Example
a
b
c
D=4
T=4
Latency = 4
Throughput = 4
1
1
2
y
1
How can this be
improved?
Pipelining?
The Retiming Problem: Example
a
b
c
1
2
1
1
Latency = 6
Throughput = 3
y
Delay is not balanced
This can still be improved
The Retiming Problem: Example
a
b
c
1
1
Latency = 4
Throughput = 2
2
y
1
Now, delay is balanced
Observations
Some basic transformations can be used for cycle time
reduction
The retiming transformation moves registers across
gates
1
1
Observations
Levelization doesn’t help
Only useful for acyclic circuits
Naïve Algorithm
while ( not timed ) {
pick a candidate gate;
apply retiming transformation;
do timing analysis;
}
Questions
Can we apply retiming in batch mode?
i.e.,
simultaneously on all gates
Can we make sure the retimed circuit is optimal?
Can we achieve this in polynomial time?
Retiming Circuit Model
G V , E, d , w
V : gates and primary inputs
E V V
D :V
: delay of gates
W :E
: # registers between 2 gates
Retiming Circuit Model: Example 1
a
b
c
2
2
x
3
0
y
vx
0 vi
1
vy
0
3
Retiming Circuit Model: Example 1
a
b
c
2
2
x
3
1
y
vx
0 vi
0
vy
0
3
Retiming Circuit Model: Example 2
e
f
g
7
7
7
0
3
h
3
a
7
vg
3
b
vf
0
ve
0
0 vh
d
7
7
0
3
c
0
0
0
0
1
vd
1
va
3
1
vb
3
1
vc
3
3
Metrics
Path delay:
d p d vi ,
, v j dk
Path weight:
w p w vi ,
, v j wk
Metrics
Define weight and delay metrics for any given vertex
pair:
W u, v min w p
pu v
D u, v
max
pu v w p W u ,v
d p
Both quantities are undefined if there is no path p from
u to v
W (D) Matrix for Example 2
W (D)
a
b
c
d
e
f
g
h
a
0 (3)
1 (6)
2 (9)
3 (12)
2 (16)
1 (13)
0 (10)
0 (10)
b
1 (20)
0 (3)
1 (6)
2 (9)
1 (13)
0 (10)
0 (17)
0 (17)
c
1 (27)
2 (30)
0 (3)
1 (6)
0 (10)
0 (17)
0 (24)
0 (24)
d
1 (27)
2 (30)
3 (33)
0 (3)
0 (10)
0 (17)
0 (24)
0 (24)
e
1 (24)
2 (27)
3 (30)
4 (33)
0 (7)
0 (14)
0 (21)
0 (21)
f
1 (17)
2 (20)
3 (23)
4 (26)
3 (30)
0 (7)
0 (14)
0 (14)
g
1 (10)
2 (13)
3 (16)
4 (19)
3 (23)
2 (20)
0 (7)
0 (7)
h
1 (3)
2 (6)
3 (9)
4 (12)
3 (16)
2 (13)
1 (10)
0 (0)
The Retiming Transformation
How do we represent retiming?
How does it affect G?
Informally:
The transformation is fundamentally moving registers across
gates
Represent it as the number of registers to push from a gate’s
outputs to its inputs
Define this number for all gates
The Retiming Transformation
Definition: a retiming of a network G V , E, d , w is an
integer-valued vertex labelling r : V that transforms G
into G V , E , d , w where for each edge Vi , V j E :
wij wij rj ri
The Retiming Transformation
2
0
vx
1
0 vi
vy
3
0
2
1
vx
0 vi
0
vy
0
3
Initially:
wix 0, wxy 1, wiy 0
Apply retiming:
ri 0, rx 1, ry 0
Finally:
wix 1, wxy 0, wiy 0
Note: retiming will
change the number of
registers in general, but
not the number of
registers in a given cycle
Legal and Feasible Retiming
A retiming is legal if the retimed network doesn’t contain
negative weights:
wij wij rj ri 0
rj ri wij ri wij
For a given cycle time , the network is timing feasible if
it can correctly operate under
This holds if for all D i, j , W i, j 1
Feasible Retiming
Furthermore:W i, j wik1 wk1k2
wkm j
wik1 rk1 ri
wk1k2 rk2 rk1
wkm j rj rkm
W i, j rj ri
Finally:
rj ri W i, j 1 ri 1 W i, j
Feasible Test Algorithm
Any retiming must satisfy the system of difference
constraints:
rj ri W i, j
Vi ,V j E
rj ri 1 W i, j
D i, j
General approach: integer linear programming
Special form: single-source longest path problem
Note: we can skip the second inequality wherever
D i, j d j or D i, j d i
Feasible Test Algorithm
Longest path problem can be solved with Bellman-Ford
Build a constraint graph with an edge from i to j if we
have a constraint of the form rj ri bk
7
vg
0
0 vh
-1
1
0
1
7
vf
0
ve
-2
0
-1
0
0
0
-1
13
7
va
3
-1
vb
3
vd
-1
-1
vc
3
3
Feasible Test Algorithm
The solution is feasible if there are no positive cycles
If feasible, the longest distance of each vertex provides
the retiming function
For the previous example, with reference node vh:
ra 1
re 2
rb 2
r f 1
rc 3
rg 0
rd 2
rh 0
There are no positive cycles; hence, 13 is a feasible
clock period
Feasible Test Algorithm
Here, there is a positive cycle:
Hence, a clock period of 12 is not feasible
vb ve v f vg vb
7
1
7
1
7
vg
0
vf
0
ve
0
0 vh
0
0
-1
0
0
0
-1
12
vd
-1
va
3
-1
vb
3
-1
vc
3
3
Optimally Retimed Example Circuit
7
7
7
vg
1
vf
1
ve
0
1
0 vh
1
1
0
0
0
va
0
vb
3
1
vc
3
3
f
g
vd
7
e
7
7
0
h
3
a
3
b
3
c
3
d
3
Optimal Retiming
Binary search of minimum cycle
optimalRetiming ( G ) {
min = 0;
max = MAX;
while ( min ≠ max ) {
mid = ( max – min ) / 2;
if ( feasibleTest ( G, mid ) )
max = mid;
else
min = mid;
}
return min;
}
time
Optimal Retiming
Do we really need to search all clock periods? No…
Optimal cycle time must be one of D(i,j)
So, sort and search O(V2) clock periods
Computing each D(i,j) requires O(VE+V2 lgV) time
Overall, the complexity is O(VE lgV)
Optimal Retiming
Can we do better? Yes…
Look at the delay-to-register ratios and maximum node delay of
the cycles in the circuit, where delay-to-register ratio and
maximum node delay are defined as:
d v
R C
w e
vC
D max d v : v V
eC
Then, the minimum feasible clock period lies in the
range:
max R C G max R C D 1
cG
min
cG
This improves the overall running time to O(VE lgD)
Pipelining
Can be thought of as a special case of retiming
fetch
decode
execute
writeback
fetch
decode
execute
writeback
fetch
decode
execute
writeback
Software Pipelining
This can also be thought of in
terms of retiming
Loop
Boundary
Iteration
© Copyright 2026 Paperzz