Temporal Logic Replication for
Dynamically Reconfigurable FPGA
Partitioning
Wai-Kei Mak
Dept. of Computer Science and Engineering
University of South Florida
Evangeline F.Y. Young
Dept. of Computer Science and Engineering
The Chinese University of Hong Kong
Outline
I. Dynamically reconfigurable FPGA
II. Temporal partitioning = Conventional
partitioning?
III. Temporal logic replication
What?
Why?
How?
IV. Experimental results
V. Conclusions
Dynamically Reconfigurable FPGA
Store multiple contexts on chip.
Reuse logic blocks and wire segments
dynamically.
The contexts stored can correspond to the
multiple stages of a large circuit.
Temporal Circuit Partitioning
Temporal partitioning
multiple stages execute sequentially
Spatial partitioning
multiple components execute concurrently
Temporal Logic Replication
Can reduce buffering requirement.
Effectively utilize available slack logic capacity.
Temporal Constraints
For a net n = (v1, {v2, …, vp}),
require s(v1) s(vj), j=2,…,p, if v1 is a combinational node
Temporal Constraints (Cont’d)
require s(vj) s(v1), j=2,…,p, if v1 is a flip-flop node
Temporal Partitioning with
Replication
Problem: Partition given circuit into pre-defined
# stages satisfying all temporal constraints.
Objective: Minimize buffers required between
stages.
Proposal: Utilize available slack logic capacity to
reduce signal buffering.
Solution: An effective 2-step approach.
2-Step Approach
Step 1: Compute a temporal partition w/o replication.
Step 2: Repeatedly identify the bottleneck stage and
apply replication for that stage.
Advantages of 2-Step Approach
Will not replicate unnecessarily.
All temporal constraints are already satisfied
when replicating.
Min-Area Min-Cut Replication
Let stage i be the bottleneck stage.
Min-Cut Replication
Compute a subset of nodes Ri in stage i for
replication into stage i+1 to maximally reduce
the communication cost at stage i.
Min-Area Min-Cut Replication
Compute a minimum subset of nodes Ri in
stage i for replication into stage i+1 to
maximally reduce the communication cost at
stage i.
Optimal Solution for Min-Area
Min-Cut Replication
Let Vi = set of nodes in stage i.
Observation 1:
The min-cut replication problem can be solved by
computing a minimum cut (Vi-Ri,Ri) in stage i.
Observation 2:
The min-area min-cut replication problem can be
solved by computing a minimum cut (Vi-Ri,Ri)
in stage i s.t. |Ri| is minimized.
Example
A pre-partition:
Computing a minimum cut in stage 2:
Example (Cont’d)
Computed R2 = {j}
Network Modeling
Need to ensure that
cut size = buffer requirement
For a net (v1, {v2, …, vp}),
The Case of Limited Slack Logic
Capacity
The solution of min-area min-cut replication suffices
if slack logic capacity is sufficiently large.
Otherwise, |Ri| exceeds the slack, then use a
heuristic to reduce Ri.
Use a repeated max-flow min-cut heuristic to
gradually reduce Ri (so cut size is only increased
gradually).
H. Yang, D.F. Wong, “Efficient Network Flow based
Min-Cut Balanced Partitioning”, ICCAD’94.
Algorithm
Input: Stage area bound A.
1. Network modeling for bottleneck stage i.
2. Compute min-cut (Vi-Ri,Ri) s.t. |Ri| is
minimized.
3. If |Vi+1|+|Ri| A, stop and return Ri.
4. Collapse a node in Ri with all nodes in
Vi-Ri, goto 2.
Experimental Results
Circuit
#buf w/o rep. #buf w/ rep. Imprv. % Rep. %
C3540
C5315
C6288
198
140
83
194
129
63
2.02
7.86
24.10
0.48
0.67
4.41
C7552
S13207
S15850
210
688
761
176
669
699
16.19
2.76
8.15
3.12
2.54
3.59
S35932
S38417
S38584
2729
2194
2280
2636
2104
2137
3.41
4.10
6.27
2.48
0.63
0.98
Conclusions
Proposed temporal logic replication to reduce
buffering requirement in DRFPGA partitioning.
Presented an effective 2-step approach.
Formulated and optimally solved the min-area
min-cut replication problem.
Extended to case of limited slack logic capacity.
In the paper, a new timing-driven temporal
partitioning algorithm was introduced to
compute pre-partition.
© Copyright 2026 Paperzz