presentation - home page conferenze dei

COGRE: A Configuration Memory Reduced
Reconfigurable Logic Cell Architecture
for Area Minimization
20th International Conference
on Field Programmable Logic and Applications
Milano, ITALY, Aug. 31st – Sep. 2nd, 2010
Yasuhiro Okamoto, Yoshihiro Ichinomiya, Motoki Amagasaki,
Masahiro Iida and Toshinori Sueyoshi
Kumamoto University, Japan
Background
 Expansion of design scale
 Long time to market
 High manufacturing costs
 High design costs
To solve these problems …
Reconfigurable Logic Device(RLD)
However …
RLDs can’t achieve the performance as well as ASICs
Propose a novel logic cell for improving efficiency
1
Motivation
Look-up tables (LUTs) are employed as a conventional logic cell
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
A
0 1
0 1
0 1
0 1 cells
B
We propose
more
compact
logic
0 1
0 1
C logics
by eliminating rarely-used
implementable
0
1
D
OUT
4-Input LUT
N-LUT can implement any N-input logic functions
2
Appearance Ratio of Logic Functions
Extract high-utilization-ratio logic functions
Which logics are frequently used ?
A+B(C+D)
[ ?% ]
AB+CD
[ ?% ]
ABCD
[ ?% ]
A(B+CD)
[ ?% ]
AB+C+D
[ ?% ]
Logic Circuit
We need to investigate the appearance ratio of logics
3
NPN Equivalence Class
A
B C
B
A C
(N) : Negate A, B, C
(P) : Permutate A and B
(N) : Negate the output
B+A C
A+BC
Two functions are NPN equivalent B(A+C)
NPN operations over logics
(N ) Input Negation
Two functions belong
to same
NPN equivalnce class
Permutation
(P ) Input
(N ) Output Negation
4
Investigation of Logic Appearance Ratio
Target Architecture
Benchmark
Circuits
 4-input LUT
Mapping Tool
 Priority-cuts based mapping tool
(implemented on ABC)
Evaluation Circuits
Technology
Mapping
Logic
Extraction
 20 MCNC benchmark circuits
• alu4
• des
• ex5p
• s38417
• apex2
• diffeq
• frisc
• s38584.1
• apex4
• dsip
• misex3
• seq
• clma
• elliptic
• pdc
• spla
• bigkey
• ex1010
• s298
• tseng
5
Investigation Results
Rank Logic Function (NPN)
Appearance Ratio [%]
1
2
ABCD
AB(C+D)
27.2
17.0
3
4
5
6
-
A(B+C+D)
AB+CD
A(B+CD)
A(BC+BD)
Others
13.7
12.8
12.0
5.6
11.7
88.3%
Small portion of NPN class account for a large share
6 / 222
6
5-Input COGRE Architecture
COGRE : Compactly Organized Generic Reconfigurable Element
A B
C
D E
Implementable
programmable inverter
M
M
ABCD
AB(C+D)
A(B+C+D)
AB+CD
A(B+CD)
A(BC+BD)
M : Configuration Memory Bit
93.4% appearance logic functions are covered
Compact logic block architecture
(mapped as 4-LUT or more smaller LUT)
OUT
compared to 4-LUT
7
Example of Logic Implementation
A A
B
C D
A
Implementation
of
1
1
AB
「A+B+C+D」
CD
A+B+C+D
Input sharing or constant assignment is necessary
to implement 4-input functions
8
NPN Structure
IN-1
M
IN-2
IN-N
…
M
M
M
OUT
We need P-NOTs for inputs and a output
P-NOT : Programmable NOT
9
6-Input COGRE Architecture
A B
C DE F
Implementable
M
M
Added
M
M
ABCDEF
AB(C+D+E+F)
ABCD(E+F)
(A+B)(C+D)(E+F)
AB(C+D)(E+F)
AB+CDEF
50.6% appearance
logic functions are covered
OUT
(mapped
6-LUT
or to
more
smaller
LUT)
Add
NANDsas
and
P-NOTs
extend
the logic
cell
10
Number of Cofiguration Memory Bits
COGREs are small-memory logic cells
# of configuration memory bits
6-COGRE
11
5-COGRE
8
6-LUT
64
5-LUT
32
4-LUT
16
# of configuration memory bits required by 6-COGRE
is smaller than that of 4-LUT
11
Evaluation Method
Target Architectures
 6-COGRE
 5-COGRE
 6-LUT
 5-LUT
 4-LUT
Evaluation Items
 Logic Area  Routing Area
 Total Tracks  Critical Path Dealy
 # of Configuration Memory bits
Evaluation Circuits
 20 MCNC benchmark circuits
Benchmarks
Technology Mapping
(Modified ABC)
Clustering
(T-VPack)
Place and Routing
(VPR 5.0)
Area and Delay
Reports
Reports
Evaluation Flow
12
Cluster Structure
GND
Input Selection MUX
Output Selection MUX
BLE
COGRE
D-FF
COGRE
D-FF
N
COGRE
D-FF
I
K
COGRE
D-FF
4
A cluster has 4 COGREs and D-FFs
13
Area and Dealy Model
Synthesize COGREs and LUTs using Synopsys Design Compiler
targeting the e-Shuttle 1.2V, 65nm CMOS technology standard cell libary
BLE
Logic Cluster (4 BLEs)
Area
[um 2 ]
Delay
[ps]
# of
conf. bits
Area
[um2 ]
Dealy
[ps]
# of
conf. bits
6-COGRE
328.32
169.56
12
4308.48 304.49
168
5-COGRE
194.88
131.68
9
3131.52 268.42
136
6-LUT
1456.08 179.23
65
8819.52 314.16
380
5-LUT
748.32
166.16
33
5345.28 281.63
212
4-LUT
450.96
159.52
17
3347.52 292.46
132
These parameters are used in the architecture file in VPR 5.0
14
Logic Area
Logic Area
(Normalized by 6-LUT)
1
6-COGRE
5-COGRE
0.9
6-LUT
5-LUT
0.8
4-LUT
0.7
0.6
0.5
0.4
0.3
6-COGRE vs. 6-LUT : 46.3% smaller
5-COGRE vs. 5-LUT : 32.6% smaller
5-COGRE vs. 4-LUT : 10.0% smaller
15
Routing Area
Routing Area
(Normalized by 6-LUT)
3.6
6-COGRE
3.1
5-COGRE
6-LUT
2.6
5-LUT
4-LUT
2.1
1.6
1.1
0.6
6-COGRE vs. 6-LUT : 9.66% larger
5-COGRE vs. 5-LUT : 11.8% larger
5-COGRE vs. 4-LUT : 12.7% larger
16
Number of Routing Tracks
(Normalized by 6-LUT)
Total Number of Routing Tracks
6-COGRE
4.7
5-COGRE
4.2
6-LUT
3.7
5-LUT
3.2
4-LUT
2.7
2.2
1.7
1.2
0.7
6-COGRE vs. 6-LUT : 4.12% larger
5-COGRE vs. 5-LUT : 10.0% larger
5-COGRE vs. 4-LUT : 4.83% larger
17
Critical Path Delay
1.75
6-COGRE
5-COGRE
6-LUT
5-LUT
4-LUT
Critical Path Delay
(Normalized by 6-LUT)
1.7
1.65
1.6
1.55
1.5
1.45
1.4
1.35
1.3
1.25
1.2
1.15
1.1
1.05
1
0.95
0.9
0.85
6-COGRE vs. 6-LUT : 6.96% larger
6-COGRE vs. 5-LUT : 3.50% smaller
5-COGRE vs. 5-LUT : 5.29% larger
5-COGRE vs. 4-LUT : 2.72% smaller
18
# of Configuration Memory Bits
(Normalized by 6-LUT)
Total Number of Configuration Memory Bits
1
0.95
6-COGRE
6-LUT
4-LUT
5-COGRE
5-LUT
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
6-COGRE vs. 6-LUT : 32.1% smaller
5-COGRE vs. 5-LUT : 13.2% smaller
5-COGRE vs. 4-LUT : 1.95% smaller
19
Conclusion
COGRE : A novel compact logic cell architecture
Focus on the appearace ratio of logic functions
Results
 Logic Area
6-COGRE vs. 6-LUT : 46.3% smaller
5-COGRE vs. 5-LUT : 32.6% smaller
 The Number of SRAMs
6-COGRE vs. 6-LUT : 32.1% smaller
5-COGRE vs. 5-LUT : 13.2% smaller
20
Thank you for your attention !!
APPENDIX
Future Work
For performance improvement …
 Increase the cluster size
4
?
 Combine COGREs and LUTs
 Combine different COGREs
 Further investigation
CLUSTER
23
Total Area(Logic+Routing)
1
6-COGRE
Total Area
(Normalized by 6-LUT)
5-COGRE
0.9
6-LUT
5-LUT
0.8
4-LUT
0.7
0.6
0.5
0.4
0.3
6-COGRE v.s. 6-LUT : 39.6% smaller
5-COGRE v.s. 5-LUT : 25.7% smaller
5-COGRE v.s. 4-LUT : 5.66% smaller
24
NPN Equivalence
A
B
C
D
4!
4
2
4-Input Function
1
2
OUT
N+1
Maximum # of functions in a NPN class : 2
N!
25
Various COGREs
6-COGRE(Ex1)
6-COGRE(Ex2)
26
Array Size(1)
alu4
apex2
apex4
bigkey
clma
des
diffeq
dsip
elliptic
ex1010
6COGRE 5COGRE 6LUT
5LUT
4LUT
12x12
14x14
12x12
13x13 14x14
15x15
13x13
107x107
16x16
14x14
15x15 17x17
14x14
13x13
14x14 15x15
107x107 107x107 107x107 107x107
36x36
126x126
14x14
36x36
36x36
36x36 36x36
126x127 126x126 126x126 126x126
14x14
12x12
13x13 14x14
107x107
10x10
14x14
107x107 107x107 107x107 107x107
11x11
10x10
10x10 11x11
16x16
14x14
15x15 18x18
27
Array Size(2)
6COGRE
ex5p
frisc
misex3
pdc
s298
s38417
s38584.1
seq
spla
tseng
18x18
34x34
13x13
24x24
4x4
34x34
86x86
19x19
23x23
44x44
5COGRE
18x18
34x34
14x14
23x23
4x4
34x34
86x86
19x19
23x23
44x44
6LUT
18x18
34x34
12x12
21x21
3x3
34x34
86x86
19x19
21x21
44x44
5LUT
18x18
34x34
13x13
24x24
3x3
34x34
86x86
19x19
23x23
44x44
4LUT
18x18
34x34
14x14
25x25
4x4
34x36
86x86
19x19
25x25
44x44
28
Area Model in VPR 5.0
CB
SB
CB
SB
LB
CB
LB
CB
Used
Unused
CB
SB
CB
SB
LB
CB
LB
CB
Used
Used
Array Size : 2 x 2
Logic Area = 2 x 2 x (Logic Area Per Tile)
Routing Area = 2 x 2 x (Routing Area Per Tile)
LB : Logic Block
SB : Switch Box
CB : Connection Box
29
The Structure of Island-Style FPGA
I/O Pad
A
B
Switch Box
A+B
Connection Box
Configurable Logic Block
We focus on logic block architecture
FPGA : Field Programmable Gate Array 30
The Number of Used Transistors(LUT)
4-LUT
5-LUT
1-Bit SRAM
16×6
32×6
2-Input MUX
15×4
31×4
Inverter
4×2
5×2
Total
164
326
Logical Element
# of Transistors
1-Bit SRAM
6
2-Input MUX
4
Inverter
2
4-LUT
31
The Number of Used Transistors(COGRE)
5-COGRE 6-COGRE
2-Input NAND
4×4
5×4
1-Bit SRAM
8×6
11×6
Programmable
Inverter
8×6
11×6
Total
112
152
Logical Element
# of Transistors
2-Input NAND
4
1-Bit SRAM
6
Programmable Inverter
6
5-COGRE
32
What is RLD?
Flexibility
MPEG-2Area
MP3
Speed
Flexibility
Area
 MPEG-2  MP3
Speed
 RSA
Power
 CPU
Power
We are able to implement
RSA
CPU
ASIC
any application
RLD
We aim to increase performance of RLD
ASIC : Application Specific Integrated Circuit
33