COGRE: A Configuration Memory Reduced Reconfigurable Logic Cell Architecture for Area Minimization 20th International Conference on Field Programmable Logic and Applications Milano, ITALY, Aug. 31st – Sep. 2nd, 2010 Yasuhiro Okamoto, Yoshihiro Ichinomiya, Motoki Amagasaki, Masahiro Iida and Toshinori Sueyoshi Kumamoto University, Japan Background Expansion of design scale Long time to market High manufacturing costs High design costs To solve these problems … Reconfigurable Logic Device(RLD) However … RLDs can’t achieve the performance as well as ASICs Propose a novel logic cell for improving efficiency 1 Motivation Look-up tables (LUTs) are employed as a conventional logic cell M M M M M M M M M M M M M M M M 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 A 0 1 0 1 0 1 0 1 cells B We propose more compact logic 0 1 0 1 C logics by eliminating rarely-used implementable 0 1 D OUT 4-Input LUT N-LUT can implement any N-input logic functions 2 Appearance Ratio of Logic Functions Extract high-utilization-ratio logic functions Which logics are frequently used ? A+B(C+D) [ ?% ] AB+CD [ ?% ] ABCD [ ?% ] A(B+CD) [ ?% ] AB+C+D [ ?% ] Logic Circuit We need to investigate the appearance ratio of logics 3 NPN Equivalence Class A B C B A C (N) : Negate A, B, C (P) : Permutate A and B (N) : Negate the output B+A C A+BC Two functions are NPN equivalent B(A+C) NPN operations over logics (N ) Input Negation Two functions belong to same NPN equivalnce class Permutation (P ) Input (N ) Output Negation 4 Investigation of Logic Appearance Ratio Target Architecture Benchmark Circuits 4-input LUT Mapping Tool Priority-cuts based mapping tool (implemented on ABC) Evaluation Circuits Technology Mapping Logic Extraction 20 MCNC benchmark circuits • alu4 • des • ex5p • s38417 • apex2 • diffeq • frisc • s38584.1 • apex4 • dsip • misex3 • seq • clma • elliptic • pdc • spla • bigkey • ex1010 • s298 • tseng 5 Investigation Results Rank Logic Function (NPN) Appearance Ratio [%] 1 2 ABCD AB(C+D) 27.2 17.0 3 4 5 6 - A(B+C+D) AB+CD A(B+CD) A(BC+BD) Others 13.7 12.8 12.0 5.6 11.7 88.3% Small portion of NPN class account for a large share 6 / 222 6 5-Input COGRE Architecture COGRE : Compactly Organized Generic Reconfigurable Element A B C D E Implementable programmable inverter M M ABCD AB(C+D) A(B+C+D) AB+CD A(B+CD) A(BC+BD) M : Configuration Memory Bit 93.4% appearance logic functions are covered Compact logic block architecture (mapped as 4-LUT or more smaller LUT) OUT compared to 4-LUT 7 Example of Logic Implementation A A B C D A Implementation of 1 1 AB 「A+B+C+D」 CD A+B+C+D Input sharing or constant assignment is necessary to implement 4-input functions 8 NPN Structure IN-1 M IN-2 IN-N … M M M OUT We need P-NOTs for inputs and a output P-NOT : Programmable NOT 9 6-Input COGRE Architecture A B C DE F Implementable M M Added M M ABCDEF AB(C+D+E+F) ABCD(E+F) (A+B)(C+D)(E+F) AB(C+D)(E+F) AB+CDEF 50.6% appearance logic functions are covered OUT (mapped 6-LUT or to more smaller LUT) Add NANDsas and P-NOTs extend the logic cell 10 Number of Cofiguration Memory Bits COGREs are small-memory logic cells # of configuration memory bits 6-COGRE 11 5-COGRE 8 6-LUT 64 5-LUT 32 4-LUT 16 # of configuration memory bits required by 6-COGRE is smaller than that of 4-LUT 11 Evaluation Method Target Architectures 6-COGRE 5-COGRE 6-LUT 5-LUT 4-LUT Evaluation Items Logic Area Routing Area Total Tracks Critical Path Dealy # of Configuration Memory bits Evaluation Circuits 20 MCNC benchmark circuits Benchmarks Technology Mapping (Modified ABC) Clustering (T-VPack) Place and Routing (VPR 5.0) Area and Delay Reports Reports Evaluation Flow 12 Cluster Structure GND Input Selection MUX Output Selection MUX BLE COGRE D-FF COGRE D-FF N COGRE D-FF I K COGRE D-FF 4 A cluster has 4 COGREs and D-FFs 13 Area and Dealy Model Synthesize COGREs and LUTs using Synopsys Design Compiler targeting the e-Shuttle 1.2V, 65nm CMOS technology standard cell libary BLE Logic Cluster (4 BLEs) Area [um 2 ] Delay [ps] # of conf. bits Area [um2 ] Dealy [ps] # of conf. bits 6-COGRE 328.32 169.56 12 4308.48 304.49 168 5-COGRE 194.88 131.68 9 3131.52 268.42 136 6-LUT 1456.08 179.23 65 8819.52 314.16 380 5-LUT 748.32 166.16 33 5345.28 281.63 212 4-LUT 450.96 159.52 17 3347.52 292.46 132 These parameters are used in the architecture file in VPR 5.0 14 Logic Area Logic Area (Normalized by 6-LUT) 1 6-COGRE 5-COGRE 0.9 6-LUT 5-LUT 0.8 4-LUT 0.7 0.6 0.5 0.4 0.3 6-COGRE vs. 6-LUT : 46.3% smaller 5-COGRE vs. 5-LUT : 32.6% smaller 5-COGRE vs. 4-LUT : 10.0% smaller 15 Routing Area Routing Area (Normalized by 6-LUT) 3.6 6-COGRE 3.1 5-COGRE 6-LUT 2.6 5-LUT 4-LUT 2.1 1.6 1.1 0.6 6-COGRE vs. 6-LUT : 9.66% larger 5-COGRE vs. 5-LUT : 11.8% larger 5-COGRE vs. 4-LUT : 12.7% larger 16 Number of Routing Tracks (Normalized by 6-LUT) Total Number of Routing Tracks 6-COGRE 4.7 5-COGRE 4.2 6-LUT 3.7 5-LUT 3.2 4-LUT 2.7 2.2 1.7 1.2 0.7 6-COGRE vs. 6-LUT : 4.12% larger 5-COGRE vs. 5-LUT : 10.0% larger 5-COGRE vs. 4-LUT : 4.83% larger 17 Critical Path Delay 1.75 6-COGRE 5-COGRE 6-LUT 5-LUT 4-LUT Critical Path Delay (Normalized by 6-LUT) 1.7 1.65 1.6 1.55 1.5 1.45 1.4 1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 6-COGRE vs. 6-LUT : 6.96% larger 6-COGRE vs. 5-LUT : 3.50% smaller 5-COGRE vs. 5-LUT : 5.29% larger 5-COGRE vs. 4-LUT : 2.72% smaller 18 # of Configuration Memory Bits (Normalized by 6-LUT) Total Number of Configuration Memory Bits 1 0.95 6-COGRE 6-LUT 4-LUT 5-COGRE 5-LUT 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 6-COGRE vs. 6-LUT : 32.1% smaller 5-COGRE vs. 5-LUT : 13.2% smaller 5-COGRE vs. 4-LUT : 1.95% smaller 19 Conclusion COGRE : A novel compact logic cell architecture Focus on the appearace ratio of logic functions Results Logic Area 6-COGRE vs. 6-LUT : 46.3% smaller 5-COGRE vs. 5-LUT : 32.6% smaller The Number of SRAMs 6-COGRE vs. 6-LUT : 32.1% smaller 5-COGRE vs. 5-LUT : 13.2% smaller 20 Thank you for your attention !! APPENDIX Future Work For performance improvement … Increase the cluster size 4 ? Combine COGREs and LUTs Combine different COGREs Further investigation CLUSTER 23 Total Area(Logic+Routing) 1 6-COGRE Total Area (Normalized by 6-LUT) 5-COGRE 0.9 6-LUT 5-LUT 0.8 4-LUT 0.7 0.6 0.5 0.4 0.3 6-COGRE v.s. 6-LUT : 39.6% smaller 5-COGRE v.s. 5-LUT : 25.7% smaller 5-COGRE v.s. 4-LUT : 5.66% smaller 24 NPN Equivalence A B C D 4! 4 2 4-Input Function 1 2 OUT N+1 Maximum # of functions in a NPN class : 2 N! 25 Various COGREs 6-COGRE(Ex1) 6-COGRE(Ex2) 26 Array Size(1) alu4 apex2 apex4 bigkey clma des diffeq dsip elliptic ex1010 6COGRE 5COGRE 6LUT 5LUT 4LUT 12x12 14x14 12x12 13x13 14x14 15x15 13x13 107x107 16x16 14x14 15x15 17x17 14x14 13x13 14x14 15x15 107x107 107x107 107x107 107x107 36x36 126x126 14x14 36x36 36x36 36x36 36x36 126x127 126x126 126x126 126x126 14x14 12x12 13x13 14x14 107x107 10x10 14x14 107x107 107x107 107x107 107x107 11x11 10x10 10x10 11x11 16x16 14x14 15x15 18x18 27 Array Size(2) 6COGRE ex5p frisc misex3 pdc s298 s38417 s38584.1 seq spla tseng 18x18 34x34 13x13 24x24 4x4 34x34 86x86 19x19 23x23 44x44 5COGRE 18x18 34x34 14x14 23x23 4x4 34x34 86x86 19x19 23x23 44x44 6LUT 18x18 34x34 12x12 21x21 3x3 34x34 86x86 19x19 21x21 44x44 5LUT 18x18 34x34 13x13 24x24 3x3 34x34 86x86 19x19 23x23 44x44 4LUT 18x18 34x34 14x14 25x25 4x4 34x36 86x86 19x19 25x25 44x44 28 Area Model in VPR 5.0 CB SB CB SB LB CB LB CB Used Unused CB SB CB SB LB CB LB CB Used Used Array Size : 2 x 2 Logic Area = 2 x 2 x (Logic Area Per Tile) Routing Area = 2 x 2 x (Routing Area Per Tile) LB : Logic Block SB : Switch Box CB : Connection Box 29 The Structure of Island-Style FPGA I/O Pad A B Switch Box A+B Connection Box Configurable Logic Block We focus on logic block architecture FPGA : Field Programmable Gate Array 30 The Number of Used Transistors(LUT) 4-LUT 5-LUT 1-Bit SRAM 16×6 32×6 2-Input MUX 15×4 31×4 Inverter 4×2 5×2 Total 164 326 Logical Element # of Transistors 1-Bit SRAM 6 2-Input MUX 4 Inverter 2 4-LUT 31 The Number of Used Transistors(COGRE) 5-COGRE 6-COGRE 2-Input NAND 4×4 5×4 1-Bit SRAM 8×6 11×6 Programmable Inverter 8×6 11×6 Total 112 152 Logical Element # of Transistors 2-Input NAND 4 1-Bit SRAM 6 Programmable Inverter 6 5-COGRE 32 What is RLD? Flexibility MPEG-2Area MP3 Speed Flexibility Area MPEG-2 MP3 Speed RSA Power CPU Power We are able to implement RSA CPU ASIC any application RLD We aim to increase performance of RLD ASIC : Application Specific Integrated Circuit 33
© Copyright 2026 Paperzz