Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003 Circuit Retiming Given a circuit, we want to relocate the registers to achieve a better clock period. Clock period = 3 units Clock period = 2 units Retiming Registers Circuit Retiming In order to maintain the functionality of the circuit, registers can only be moved in certain ways: Retiming Circuit Retiming Given a circuit, how should we place the registers to minimize the clock period? Traditional Approach This retiming problem is firstly introduced in the following classical paper: “Retiming Synchronous Circuitry”, Charles E. Leiserson and James B. Saxe, Algorithmica, 6:5-35, 1991 Only gate delay was considered. Three methods are proposed. One of them solves the problem by mixed integer linear programming (MILP). Traditional Approach Notations: d(v) is the delay of node v. w(e) is the original no. of registers on edge e. c is the clock period that we want to check if it is feasible. r(v) is the retiming value of node v, i.e., the no. of registers moved from the output to the input of node v. (r(v) is what we want to find.) s(v) is the longest delay from a register connected directly to node v to the output of v. Traditional Approach More about s(v)… A s(v) is the delay from point A to B, including the delay of v. B v Traditional Approach Integer Linear Program: d(v) s(v) s(v) c r(u) r(v) w(e) s(u) – s(v) -d(v) for all node v for all node v for all edge e(u,v) wherever e(u,v) s.t. r(u) - r(v) = w(e) (1) (2) (3) (4) Traditional Approach Write R(v) as r(v) + s(v)/c The ILP can be written as an MILP: r(v) – R(v) -d(v)/c R(v) – r(v) 1 r(u) r(v) w(e) R(u) – R(v) w(e)-d(v)/c for all node v for all node v for all edge e(u,v) for all edge e(u,v) (1) (2) (3) (4) The above set of difference constraints can be solved in polynomial time, though it consists of both integer and real variables. Traditional Approach Use binary search to find the optimal clock: T0 = 0 T1 = e10 // a large no. Repeat c = (T0 + T1)/2 Check if c is a feasible clock period by solving the MILP. If success, T1 = c; otherwise, T0 = c. Until success and (T1 - T0)/T1 < ε Retiming with Interconnect Delay We consider clock period minimization. Retiming has been studied and applied extensively at logic synthesis. However, most previous retiming algorithms ignore interconnect delay. Interconnect delay should be considered for high performance circuits in DSM design. This solution is going to be presented in the upcoming ICCAD 2003. Retiming with Interconnect Delay We assume that wire delay is directly proportional to its length. This assumption is reasonable: For short wires, the quadratic component of a wire delay is significantly smaller than its linear component. For long wires, buffer insertion can be done. Retiming with Interconnect Delay Retiming with Interconnect Delay Now, a retiming solution needs to specify: the retiming label r(v) for each node v. the positions of the registers on each edge. r( ) = 0 Retiming r( ) = 0 The positions of the registers on the edges are important as there are interconnect delay. r( ) = -1 r( ) = 0 Our Contributions Optimal algorithm: O(|V||E| log |V| + |V|2 log2|V|) time per iteration. Near-optimal algorithm: Only 0.13% larger than the optimal on average. O(|Vb||E| + |Vb||Eh|) time per iteration, e.g., a circuit with 16.1K gates and 28.6K wires can be retimed in 44.32s by a 1.8GHz PIII PC. Based on an optimal algorithm handling interconnect delay only, i.e., no gate delay. Optimal Approach Rewrite the ILP on p.8 as follows: d(v) s(v) for all node v s(v) c for all node v r(u) r(v) w(e) for all edge e(u,v) s(v) ≥ s(u) + d(e) + d(v) - c(r(v) - r(u) + w(e)) for all edge e(u,v) (1) (2) (3) (4) Optimal Approach Similarly, write R(v) as r(v) + s(v)/c: r(v) – R(v) -d(v)/c for all node v R(v) – r(v) 1 for all node v r(u) r(v) w(e) for all edge e(u,v) R(u) – R(v) w(e) - d(v)/c - d(e)/c for all edge e(u,v) (1) (2) (3) (4) Again, the above set of constraints can be solved in polynomial time, though the runtime is quite long. Optimal Approach Circuit s1488 s1494 s3271 s3330 s3384 s4863 s5378 s6669 s9234 s13207 s15850 s35932 s38417 s38584 |V| |E| copt 655 649 1574 1791 1687 2344 2781 3082 5599 7953 9774 16067 22181 19255 1405 1411 2707 2890 2782 4093 4261 5399 8005 11302 13794 28590 32135 33010 18.85 20.78 10.24 27.05 24.16 23.58 27.25 22.96 42.73 72.34 67.82 29.54 36.52 Runtime (s) 5.62 4.37 33.70 43.14 25.19 87.75 138.68 177.59 512.86 1161.07 1545.59 8644.27 7680.79 >15000 Near Optimal Approach Transform the original graph G by splitting each node v (represents a gate) into a pair of nodes v1 and v2 connected by an edge with delay d(v). v v1 delay = d(v) delay = 0 delay = d(v) v2 Near Optimal Approach After representing each gate by a wire, we can find an optimal retiming solution S for the transformed circuit G1. (We will show how to find the optimal solution with no gate delay.) The clock period of S will be a lower bound L for the optimal solution Topt of G. From S, we can obtain a feasible retiming solution for the original circuit G. Near Optimal Approach Registers retimed into a wire representing a gate v will be moved backward to the input edges or forward to the output edges depending on their distances from v1 and v2. v1 v2 Linear programming is used to determine the positions of the registers on each edge after this relocation step to minimize the clock period considering both gate and wire delay. Near Optimal Approach It is now the problem of solving the retiming problem optimally assuming that gate delay is zero. When there is no gate delay, the set of constraints on p.17 becomes: r(v) – R(v) 0 R(v) – r(v) 1 r(u) r(v) w(e) R(u) – R(v) w(e)-d(e)/c for all node v for all node v for all edge e(u,v) for all edge e(u,v) (1) (2) (3) (4) Near Optimal Approach Lemma 1: Given R(v) for all node v that satisfy constraint (4), we can obtain a solution to constraint (1)-(4) by setting r(v) = trunc(R(v)) Given Lemma 1, we only need to solve constraint (4): R(u) – R(v) w(e)-d(e)/c for all edge e(u,v). Consider the input graph G(V,E) such that the weight of each edge e(u,v) is -w(e)+d(e)/c. Near Optimal Approach There is a solution to constraint (4) iff G has no positive cycles. Positive cycle detection in G can be achieved by positive cycle detection in a smaller graph H(Vb,Eh) constructed from G. This technique can be applied in other positive cycle detection problems, not necessarily in circuit retiming. After solving R(v), we can find r(v) and s(v) for all node v. Near Optimal Approach After the binary search, we can find the optimal clock and the corresponding r(v) and s(v) for all node v. Then, we can place the registers accordingly: Assume that r(v)-r(u)+w(e) = 4 u v c - s(u) c Other registers are placed right in front of v. Near Optimal Approach First, assuming that gate delay is zero. Binary search to find the minimum feasible clock period c To test the feasibility of a fixed c: Transforming to a positive cycle detection problem on a reduced graph Can be solved by a single-source longest-path algorithm Results Circuit s1488 s1494 s3271 s3330 s3384 s4863 s5378 s6669 s9234 s13207 s15850 s35932 s38417 s38584 copt 18.82 20.78 10.24 27.05 24.16 23.58 27.25 22.96 42.73 72.34 67.82 29.54 36.52 cnear opt 18.85 20.78 10.24 27.05 24.21 23.58 27.27 23.07 42.73 72.34 67.82 29.59 36.53 94.26 Topt (s) 5.62 4.37 33.70 43.14 25.19 87.75 138.68 177.59 512.86 1161.07 1545.59 8644.27 7680.79 >15000 Tnear opt (s) 0.28 0.25 1.09 0.50 0.74 3.12 1.16 1.91 4.08 8.11 24.02 61.25 83.56 445.63 Future Directions Consider a more accurate modeling for the interconnect delay, e.g., use Elmore delay. How to map the retiming solution to the floorplanning or placement solution? Registers are large and take up silicon resources. How to consider fan-out capacitance with interconnect delay?
© Copyright 2026 Paperzz