here

Optimality Study of Logic Synthesis for LUT-Based
FPGAs
Jason Cong and Kirill Minkovich
VLSI CAD Lab
Computer Science Department
University of California, Los Angeles
Supported by Altera, Xilinx, and Magma
under the California MICRO program.
Outline
 Motivation and
background
 Current testcases hinted towards algorithms not having much
room for improvement.
 LEKO
 Logic synthesis Examples with Known Optimals
 Creation, optimality, and results
 LEKU
 Logic synthesis Examples with Known Upper bounds
 Creation and results
 Conclusion
UCLA VLSICAD LAB
Goals of Paper
Goal was to
test the optimality of two design steps for logic synthesis:
 Technology Mapping
 Logic Optimization combined with Technology Mapping
f
f
f
Definitions
 Technology Mapping
 Logic Optimization
 Logic Synthesis = Logic Optimization + Technology Mapping
a
b
c
d
e
a
b
UCLA VLSICAD LAB
c
d
e
a
b
c
d
e
Motivation
Logic synthesis is NP-hard in general
Combining logic optimization & mapping is much harder
 Academic tools mostly focus on mapping
Problems with current test cases
 How far from optimal?
 Logic optimization?
 Decrease
of FPGA synthesis papers
 Suggests fewer improvements possible
Why there is a need for new ones
 Test specific properties of logic synthesis tools
 LEKO & LEKU
UCLA VLSICAD LAB
Construction Overview (LEKO)

First create a small “core” graph, G5, with a known optimal mapping (and
possibly a logic synthesis) solution.

G5 has to have the following properties
y5
 optimal (in terms of area/depth) mapping of
G5 into a 4-LUT mapping solution with only
has 4-LUTs (no 3-LUTs or 2-LUTs).
x5
5.
y4
Internal nodes have exactly two inputs.
x4
4.
y3
yi = f (x1,x2,…,x5)
x3
3.
y2
5 outputs (y1,y2,…,y5)
x2
2.
y1
5 inputs (x1,x2,…,x5)
x1
G5
1.
Why these properties?


Simplest G5 for 4-LUT architecture

Can be cascaded into larger structures
UCLA VLSICAD LAB
G5 – example
(optimal 7 4-LUTs)
Legend
O1
O5
Output Node
N10
N11
N5
N3
O2
Internal Node
Input Node
N9
O4
N4
N7
N2
Node Values
N16
O3
N12
N1
N14
N18
N15
N23
N6
N17
N22
N24
N19
N20
O1 = N1 ∙ N10
O2 = N13 ∙ N14
O3 = N12 + N17
O4 = N13 ∙ N16
O5 = N9 + N11
N1 = I1 ∙ I2' + I2
N2 = N1 ∙ I3' + N1' ∙ I3
N3 = N1' ∙ N7'
N4 = N1 + N6
N5 = N3' + N4'
N6 = I2' + I5'
N7 = N1 ∙ N6
N8 = I3 ∙ I4' + I3' ∙ I4
N9 = N8 ∙ N2
N10 = N9 ∙ I5' + N9' ∙ I5
N11 = I5 ∙ N5
N12 = N18 ∙ I5
N13 = I1 + I1' ∙ I2
N14 = N15 + I2
N15 = N17 ∙ I5
N16 = N17 ∙ I5 + N17' ∙ I5'
N17 = N20 ∙ N19
N18 = N23' + N24'
N19 = N13 + N13' ∙ I3
N20 = I3 + I4
N21 = I2' + I5'
N22 = N13 ∙ N21
N23 = N13' ∙ N22'
N24 = N13 + N21
N8
N13
I3
I1
I4
N21
I5
I2
UCLA VLSICAD LAB
Construction Overview (LEKO)

Algorithm Steps
1. Create a G5
2. Then duplicate it and connect them together is such a way s.t. there is a
unique traversal of G5’s from PO to PI.


This creates a new graph where we have the following properties:

There exists a known optimal mapping solution

This also provides a tight upper-bound to the optimal logic synthesis solution
By using different G5s we can construct different LEKO networks
with any variety of properties.

G5 can have different mapping and logic synthesis solutions

G5 can be based on realistic designs (multipliers, adders, etc)
UCLA VLSICAD LAB
Construction Examples (LEKO)
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
G5
UCLA VLSICAD LAB
Optimality
Theorem: The optimal mapping solution of an arbitrarily sized LEKO
circuit without logic optimization is achieved when every G5 in the
circuit is mapped optimally without overlapping any other G5.
Proof Idea: A LUT spanning two layers can will not reduce the area of
the solution. This can be easily shown
by looking at what would happen to G5
G5
at layer i and at layer i+1
3-LUT
4-LUT
layer i+1
layer i
 Complete
proof is in the paper
G5
UCLA VLSICAD LAB
LEKO Examples
LEKO
– Logic synthesis Examples with Known Optimals
 Naming
• G25 has 25 inputs and 25 outputs
• Gx has x inputs and x outputs
Tools tested
 Altera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABC
 4-LUT architecture
 Area optimization only (NP-hard)
Circuits
LEKO
# Nodes
Depth
# I/O
G25
305
13
G125
2350
G625
15,875
Optimal
# LUTs
Depth
50
70
4
20
225
525
6
27
1250
3,500
8
UCLA VLSICAD LAB
Results (LEKO)
Only mapping needed to produce optimal results.
Circuits
LEKO(G25)
LEKO(G125)
LEKO(G625)
DAOmap
ABC
Quartus
ISE
Optimal
Area
83
80
72
80
70
Ratio
1.19
1.14
1.03
1.14
1
Area
650
609
561
588
525
Ratio
1.24
1.16
1.07
1.12
1
Area
4,435
4,072
3,737
3,974
3,500
Ratio
1.27
1.16
1.07
1.14
1
1.23
1.16
1.05
1.13
1
Average Ratio
What do these mean?
 Scaled fairly well
 Average gap = 15%
Why Quartus and ISE
did so well
 Performed extra non-mapping steps
UCLA VLSICAD LAB
Creating LEKU
LEKU –
Logic synthesis Examples with Known Upper bounds
 Constructed from LEKO G25 (25 inputs and 25 outputs)
•
•
•
•
Collapse then decompose the graph
Creates much larger graph that is logically equivalent to original
LEKU-CD – collapsed  decomposed into AND/OR gates
LEKU-CB – collapsed  balanced
 LEKU-CD’
• LEKU-CD was too large for Xilinx as a single input
• Split LEKU-CD into 25 separate designs, one for each PO
Circuits
# Nodes
Depth
Upper-Bound on Optimal
#I/O
# LUTs
Depth
LEKU-CD(G25)
1,166,655
19
50
70
4
LEKU-CB(G25)
814
16
50
70
4
UCLA VLSICAD LAB
Results on LEKU
Logic Optimization
and Mapping were needed
 Academic tools were allowed to use preprocessing tools
DAOmap
ABC
Quartus
ISE
Upper
Bounds
Area
22,717
30,511
10,381
*
70
Ratio
325
436
148
*
1
Area
25,247
35,271
5,005
9,717
70
Ratio
361
504
72
139
1
Area
322
191
239
280
70
Ratio
4.6
2.7
3.4
4
1
Average Ratio (last 2 designs)
183
255
38
72
1
Average Ratio (ALL)
230
314
74
*
1
Circuits
LEKUCD(G25)
LEKUCD(G25)’
LEKUCB(G25)
What
does this mean?
 There exist designs on which these tool perform very badly
 Average gap = 171x
 Suggest that all of these tools lack global minimization heuristics
UCLA VLSICAD LAB
LEKO/LEKU vs Real Designs
 Limitations
 Whole circuit is combinational logic
 Contain highly repeated structures in the original circuits
 Doesn’t mean tools are 70x away from optimal on real designs
 Different
uses than real design
 LEKO
• Test mapping phase of algorithm


Perform well on current LEKO benchmarks
Will construct larger core graphs  worse results ?
 LEKU
• Test logic optimization phase of algorithm




Ability to reproduce original structure
Duplication removal
Logic Identification
Other global heuristics
UCLA VLSICAD LAB
Conclusions
 Conclusions
 LEKO
• Only circuits that test optimality of technology mapping
• Have an optimal mapping solution
 LEKU
• Test global area minimizing heuristics
• Have a very tight upper bound on optimal solution
 These circuits address a need for specific method testing
 Current state
of technology
 Technology Mapping
• Current tools do very well
 Overall Logic Synthesis
• Current tools just can’t produce good solutions that require a global
minimization heuristics.
UCLA VLSICAD LAB
Conclusions (continued)
 Download
every testcases mentioned here
 http://cadlab.cs.ucla.edu/
Click on “Optimality Study”
Click on “LEKO/LEKU”
 Harder and Larger LEKO and LEKU circuits will be posted soon!
 Check
out the article in EE Times
 Just search EE Times for “kirill”
 Thank you EE Times for your interest!
http://eetimes.com/showArticle.jhtml?articleID=180204087
 Questions?
UCLA VLSICAD LAB
UCLA VLSICAD LAB
Additional Slides
Construction Algorithm (LEKO)
UCLA VLSICAD LAB
Variations
 LEKO
 Using larger core graphs to create more complex designs
 Using commonly used cells as the core graphs
 Using collection of core graphs
 LEKU
 Using LEKO and adding in specific things to test
• Duplicating some specific parts
• Adding wires that will be removed when DON’T CARES are computed
UCLA VLSICAD LAB
Interesting New Results
 After
seeing the results we got several responses
 ABC
• Repeating
map 4-LUTs  don’t care calculation
let to 3x improvement on the largest LEKU example
 DAOMap
• Multiple iteration of
map 5-LUTs  simplify  map 4-LUTs
showed similar improvements on the LEKU examples
 Altera
• For the LEKO the following
map 5-LUT  map 4-LUT
was able to achieve near optimal solutions
• This result wouldn’t extend if we used a larger G5
UCLA VLSICAD LAB
Different G5s

Assuming a K-LUT

G5 has to have the following properties
1.
It has m inputs and m outputs.
2.
Every output is a function of all five inputs.
3.
Each internal node of G5 has exactly two inputs.
4.
There exists an optimal (in terms of area/depth) mapping of G5 into a K-LUT
mapping solution, denoted M5, such that M5 only has K-LUTs.
Where


m≥K+1

The larger the m the harder the G5 is to map
UCLA VLSICAD LAB