Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California, Los Angeles Supported by Altera, Xilinx, and Magma under the California MICRO program. Outline Motivation and background Current testcases hinted towards algorithms not having much room for improvement. LEKO Logic synthesis Examples with Known Optimals Creation, optimality, and results LEKU Logic synthesis Examples with Known Upper bounds Creation and results Conclusion UCLA VLSICAD LAB Goals of Paper Goal was to test the optimality of two design steps for logic synthesis: Technology Mapping Logic Optimization combined with Technology Mapping f f f Definitions Technology Mapping Logic Optimization Logic Synthesis = Logic Optimization + Technology Mapping a b c d e a b UCLA VLSICAD LAB c d e a b c d e Motivation Logic synthesis is NP-hard in general Combining logic optimization & mapping is much harder Academic tools mostly focus on mapping Problems with current test cases How far from optimal? Logic optimization? Decrease of FPGA synthesis papers Suggests fewer improvements possible Why there is a need for new ones Test specific properties of logic synthesis tools LEKO & LEKU UCLA VLSICAD LAB Construction Overview (LEKO) First create a small “core” graph, G5, with a known optimal mapping (and possibly a logic synthesis) solution. G5 has to have the following properties y5 optimal (in terms of area/depth) mapping of G5 into a 4-LUT mapping solution with only has 4-LUTs (no 3-LUTs or 2-LUTs). x5 5. y4 Internal nodes have exactly two inputs. x4 4. y3 yi = f (x1,x2,…,x5) x3 3. y2 5 outputs (y1,y2,…,y5) x2 2. y1 5 inputs (x1,x2,…,x5) x1 G5 1. Why these properties? Simplest G5 for 4-LUT architecture Can be cascaded into larger structures UCLA VLSICAD LAB G5 – example (optimal 7 4-LUTs) Legend O1 O5 Output Node N10 N11 N5 N3 O2 Internal Node Input Node N9 O4 N4 N7 N2 Node Values N16 O3 N12 N1 N14 N18 N15 N23 N6 N17 N22 N24 N19 N20 O1 = N1 ∙ N10 O2 = N13 ∙ N14 O3 = N12 + N17 O4 = N13 ∙ N16 O5 = N9 + N11 N1 = I1 ∙ I2' + I2 N2 = N1 ∙ I3' + N1' ∙ I3 N3 = N1' ∙ N7' N4 = N1 + N6 N5 = N3' + N4' N6 = I2' + I5' N7 = N1 ∙ N6 N8 = I3 ∙ I4' + I3' ∙ I4 N9 = N8 ∙ N2 N10 = N9 ∙ I5' + N9' ∙ I5 N11 = I5 ∙ N5 N12 = N18 ∙ I5 N13 = I1 + I1' ∙ I2 N14 = N15 + I2 N15 = N17 ∙ I5 N16 = N17 ∙ I5 + N17' ∙ I5' N17 = N20 ∙ N19 N18 = N23' + N24' N19 = N13 + N13' ∙ I3 N20 = I3 + I4 N21 = I2' + I5' N22 = N13 ∙ N21 N23 = N13' ∙ N22' N24 = N13 + N21 N8 N13 I3 I1 I4 N21 I5 I2 UCLA VLSICAD LAB Construction Overview (LEKO) Algorithm Steps 1. Create a G5 2. Then duplicate it and connect them together is such a way s.t. there is a unique traversal of G5’s from PO to PI. This creates a new graph where we have the following properties: There exists a known optimal mapping solution This also provides a tight upper-bound to the optimal logic synthesis solution By using different G5s we can construct different LEKO networks with any variety of properties. G5 can have different mapping and logic synthesis solutions G5 can be based on realistic designs (multipliers, adders, etc) UCLA VLSICAD LAB Construction Examples (LEKO) G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 UCLA VLSICAD LAB Optimality Theorem: The optimal mapping solution of an arbitrarily sized LEKO circuit without logic optimization is achieved when every G5 in the circuit is mapped optimally without overlapping any other G5. Proof Idea: A LUT spanning two layers can will not reduce the area of the solution. This can be easily shown by looking at what would happen to G5 G5 at layer i and at layer i+1 3-LUT 4-LUT layer i+1 layer i Complete proof is in the paper G5 UCLA VLSICAD LAB LEKO Examples LEKO – Logic synthesis Examples with Known Optimals Naming • G25 has 25 inputs and 25 outputs • Gx has x inputs and x outputs Tools tested Altera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABC 4-LUT architecture Area optimization only (NP-hard) Circuits LEKO # Nodes Depth # I/O G25 305 13 G125 2350 G625 15,875 Optimal # LUTs Depth 50 70 4 20 225 525 6 27 1250 3,500 8 UCLA VLSICAD LAB Results (LEKO) Only mapping needed to produce optimal results. Circuits LEKO(G25) LEKO(G125) LEKO(G625) DAOmap ABC Quartus ISE Optimal Area 83 80 72 80 70 Ratio 1.19 1.14 1.03 1.14 1 Area 650 609 561 588 525 Ratio 1.24 1.16 1.07 1.12 1 Area 4,435 4,072 3,737 3,974 3,500 Ratio 1.27 1.16 1.07 1.14 1 1.23 1.16 1.05 1.13 1 Average Ratio What do these mean? Scaled fairly well Average gap = 15% Why Quartus and ISE did so well Performed extra non-mapping steps UCLA VLSICAD LAB Creating LEKU LEKU – Logic synthesis Examples with Known Upper bounds Constructed from LEKO G25 (25 inputs and 25 outputs) • • • • Collapse then decompose the graph Creates much larger graph that is logically equivalent to original LEKU-CD – collapsed decomposed into AND/OR gates LEKU-CB – collapsed balanced LEKU-CD’ • LEKU-CD was too large for Xilinx as a single input • Split LEKU-CD into 25 separate designs, one for each PO Circuits # Nodes Depth Upper-Bound on Optimal #I/O # LUTs Depth LEKU-CD(G25) 1,166,655 19 50 70 4 LEKU-CB(G25) 814 16 50 70 4 UCLA VLSICAD LAB Results on LEKU Logic Optimization and Mapping were needed Academic tools were allowed to use preprocessing tools DAOmap ABC Quartus ISE Upper Bounds Area 22,717 30,511 10,381 * 70 Ratio 325 436 148 * 1 Area 25,247 35,271 5,005 9,717 70 Ratio 361 504 72 139 1 Area 322 191 239 280 70 Ratio 4.6 2.7 3.4 4 1 Average Ratio (last 2 designs) 183 255 38 72 1 Average Ratio (ALL) 230 314 74 * 1 Circuits LEKUCD(G25) LEKUCD(G25)’ LEKUCB(G25) What does this mean? There exist designs on which these tool perform very badly Average gap = 171x Suggest that all of these tools lack global minimization heuristics UCLA VLSICAD LAB LEKO/LEKU vs Real Designs Limitations Whole circuit is combinational logic Contain highly repeated structures in the original circuits Doesn’t mean tools are 70x away from optimal on real designs Different uses than real design LEKO • Test mapping phase of algorithm Perform well on current LEKO benchmarks Will construct larger core graphs worse results ? LEKU • Test logic optimization phase of algorithm Ability to reproduce original structure Duplication removal Logic Identification Other global heuristics UCLA VLSICAD LAB Conclusions Conclusions LEKO • Only circuits that test optimality of technology mapping • Have an optimal mapping solution LEKU • Test global area minimizing heuristics • Have a very tight upper bound on optimal solution These circuits address a need for specific method testing Current state of technology Technology Mapping • Current tools do very well Overall Logic Synthesis • Current tools just can’t produce good solutions that require a global minimization heuristics. UCLA VLSICAD LAB Conclusions (continued) Download every testcases mentioned here http://cadlab.cs.ucla.edu/ Click on “Optimality Study” Click on “LEKO/LEKU” Harder and Larger LEKO and LEKU circuits will be posted soon! Check out the article in EE Times Just search EE Times for “kirill” Thank you EE Times for your interest! http://eetimes.com/showArticle.jhtml?articleID=180204087 Questions? UCLA VLSICAD LAB UCLA VLSICAD LAB Additional Slides Construction Algorithm (LEKO) UCLA VLSICAD LAB Variations LEKO Using larger core graphs to create more complex designs Using commonly used cells as the core graphs Using collection of core graphs LEKU Using LEKO and adding in specific things to test • Duplicating some specific parts • Adding wires that will be removed when DON’T CARES are computed UCLA VLSICAD LAB Interesting New Results After seeing the results we got several responses ABC • Repeating map 4-LUTs don’t care calculation let to 3x improvement on the largest LEKU example DAOMap • Multiple iteration of map 5-LUTs simplify map 4-LUTs showed similar improvements on the LEKU examples Altera • For the LEKO the following map 5-LUT map 4-LUT was able to achieve near optimal solutions • This result wouldn’t extend if we used a larger G5 UCLA VLSICAD LAB Different G5s Assuming a K-LUT G5 has to have the following properties 1. It has m inputs and m outputs. 2. Every output is a function of all five inputs. 3. Each internal node of G5 has exactly two inputs. 4. There exists an optimal (in terms of area/depth) mapping of G5 into a K-LUT mapping solution, denoted M5, such that M5 only has K-LUTs. Where m≥K+1 The larger the m the harder the G5 is to map UCLA VLSICAD LAB
© Copyright 2026 Paperzz