FPGA introduction lecture

EECS150 - Digital Design
Lecture 2 - Combinational Logic
Review and FPGAs
January 19, 2012
John Wawrzynek
Electrical Engineering and Computer Sciences
University of California, Berkeley
http://www-inst.eecs.berkeley.edu/~cs150
Spring 2012
EECS150 lec02-logic-review-FPGAs
Page 1
General Model for Synchronous Systems
•
All synchronous digital systems fit this model:
– Collections of combinational logic blocks and state elements connected by
signal wires. These form a directed graph with only two types of nodes
(although the graph need not be bi-partite.)
– Instead of simple registers, sometimes the state elements are large memory
blocks.
Spring 2012
EECS150 lec02-logic-review-FPGAs
Page 2
Outline
• Review of three representations for
combinational logic: truth tables, gate diagrams,
algebraic equations
– Laws of Boolean Algebra
– Canonical Forms
– Boolean Simplification
– multi-level logic
– NAND/NOR Networks
• Field Programmable Gate Arrays (FPGAs)
Introduction
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 3
Combinational Logic (CL) Defined
yi = fi(x0 , . . . . , xn-1), where x, y are {0,1}.
Y is a function of only X.
• If we change X, Y will change immediately (well almost!).
• There is an implementation dependent delay from X to Y.
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 4
CL Block Example #1
Boolean Equation:
y0 = (x0 AND not(x1))
OR (not(x0) AND x1)
y0 = x0x1' + x0'x1
Truth Table Description:
Gate Representation:
How would we prove that all three representations are equivalent?
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 5
Boolean Algebra/Logic Circuits
• Why are they called “logic circuits”?
• Logic: The study of the principles of reasoning.
• The 19th Century Mathematician, George Boole, developed
a math. system (algebra) involving logic, Boolean Algebra.
• His variables took on TRUE, FALSE
• Later Claude Shannon (father of information theory)
showed (in his Master’s thesis!) how to map Boolean
Algebra to digital circuits:
• Primitive functions of Boolean Algebra:
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 6
Relationship Among Representations
* Theorem: Any Boolean function that can be expressed as a truth table can be
written as an expression in Boolean Algebra using AND, OR, NOT.
How do we convert from one to the other?
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 7
CL Block Example #2
• 4-bit adder:
•
Truth Table Representation:
R = A + B,
c is carry out
In general: 2n rows for n inputs.
256 rows!
Is there a more efficient (compact) way to specify this function?
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 8
4-bit Adder Example
• Motivate the adder circuit
design by hand addition:
•
Add a1 and b1 as follows:
• Add a0 and b0 as follows:
carry to
next stage
r = a XOR b = a ⊕ b
c = a AND b = ab
Spring 2012
r = a ⊕ b ⊕ ci
co = ab + aci + bci
Page 9
EECS150 - Lec02-logic-FPGA
4-bit Adder Example
• In general:
ri = ai ⊕ bi ⊕ cin
cout = aicin + aibi + bicin = cin(ai + bi) + aibi
“Full adder cell”
• Now, the 4-bit adder:
“ripple” adder
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 10
4-bit Adder Example
• Graphical Representation of FAcell
ri = ai ⊕ bi ⊕ cin
•
Alternative Implementation
(with 2-input gates):
ri = (ai ⊕ bi) ⊕ cin
cout = aicin + aibi + bicin
Spring 2012
Defined as:
Spring 2012
cout = cin(ai + bi) + aibi
EECS150 - Lec02-logic-FPGA
Page 11
Boolean Algebra
EECS150 - Lec02-logic-FPGA
Page 12
Logic Functions
Do the axioms hold?
– Ex: communitive law: a+b = b+a?
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 13
Some Laws of Boolean Algebra
Duality: A dual of a Boolean expression is derived by interchanging OR
and AND operations, and 0s and 1s (literals are left unchanged).
Any law that is true for an expression is also true for its dual.
Operations with 0 and 1:
1. x + 0 = x
x*1=x
2. x + 1 = 1 x * 0 = 0
Idempotent Law:
3. x + x = x
x x=x
Involution Law:
4. (x’)’ = x
Laws of Complementarity:
5. x + x’ = 1 x x’ = 0
Commutative Law:
6. x + y = y + x x y = y x
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 14
Laws of Boolean Algebra (cont.)
Associative Laws:
(x + y) + z = x + (y + z)
x y z = x (y z)
Distributive Laws:
x (y + z) = (x y) + (x z)
x +(y z) = (x + y)(x + z)
“Simplification” Theorems:
x y + x y’ = x
x+xy=x
(x + y) (x + y’) = x
x (x + y) = x
DeMorgan’s Law:
(x + y + z + …)’ = x’y’z’
(x y z …)’ = x’ + y’ +z’
Theorem for Multiplying and Factoring:
(x + y) (x’ + z) = x z + x’ y
Consensus Theorem:
x y + y z + x’ z = (x + y) (y + z) (x’ + z)
x y + x’ z = (x + y) (x’ + z)
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 15
Proving Theorems via axioms of
Boolean Algebra
Ex: prove the theorem:
x y + x y’ = x
x y + x y’ = x (y + y’) distributive law
x (y + y’) = x (1)
x (1)
=x
complementary law
identity
Ex: prove the theorem:
x+xy
x+xy=x
= x 1 + x y identity
x 1 + x y = x (1 + y) distributive law
x (1 + y) = x (1)
identity
x (1)
identity
Spring 2012
=x
EECS150 - Lec02-logic-FPGA
Page 16
DeMorgan’s Law
(x + y)’ = x’ y’
(x y)’ = x’ + y’
Spring 2012
Exhaustive
Proof
Exhaustive
Proof
EECS150 - Lec02-logic-FPGA
Page 17
Relationship Among Representations
* Theorem: Any Boolean function that can be expressed as a truth table can be
written as an expression in Boolean Algebra using AND, OR, NOT.
How do we convert from one to the other?
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 18
Canonical Forms
•
Standard form for a Boolean expression - unique algebraic expression
directly from a true table (TT) description.
•
Two Types:
* Sum of Products (SOP)
* Product of Sums (POS)
•
Sum of Products (disjunctive normal form, minterm expansion).
Example:
minterms
a’b’c’
a’b’c
a’bc’
a’bc
ab’c’
ab’c
abc’
abc
abc
000
001
010
011
100
101
110
111
f f’
01
01
01
10
10
10
10
10
Spring 2012
One product (and) term for each 1 in f:
f = a’bc + ab’c’ + ab’c +abc’ +abc
f’ = a’b’c’ + a’b’c + a’bc’
EECS150 - Lec02-logic-FPGA
Page 19
Canonical Forms
•
Product of Sums (conjunctive normal form, maxterm expansion). Example:
maxterms
a+b+c
a+b+c’
a+b’+c
a+b’+c’
a’+b+c
a’+b+c’
a’+b’+c
a’+b’+c’
abc
000
001
010
011
100
101
110
111
f f’
01
01
01
10
10
10
10
10
One sum (or) term for each 0 in f:
f = (a+b+c)(a+b+c’)(a+b’+c)
f’ = (a+b’+c’)(a’+b+c)(a’+b+c’)
(a’+b’+c)(a+b+c’)
Mapping from SOP to POS (or POS to SOP): Derive truth table then proceed.
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 20
Algebraic Simplification Example
Ex: full adder (FA) carry out function (in
canonical form):
Cout = a’bc + ab’c + abc’ + abc
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 21
Algebraic Simplification
Cout = a’bc + ab’c + abc’ + abc
= a’bc + ab’c + abc’ + abc + abc
= a’bc + abc + ab’c + abc’ + abc
= (a’ + a)bc + ab’c + abc’ + abc
= (1)bc + ab’c + abc’ + abc
= bc + ab’c + abc’ + abc + abc
= bc + ab’c + abc + abc’ + abc
= bc + a(b’ +b)c + abc’ +abc
= bc + a(1)c + abc’ + abc
= bc + ac + ab(c’ + c)
= bc + ac + ab(1)
= bc + ac + ab
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 22
Multi-level Combinational Logic
 Example: reduced sum-of-products form
x = adf + aef + bdf + bef + cdf + cef + g
 Implementation in 2-levels with gates:
cost: 1 7-input OR, 6 3-input AND
=> 50 transistors
delay: 3-input AND gate delay + 7-input OR gate delay
 Factored form:
x = (a + b +c)(d + e)f + g
cost: 1 3-input OR, 2 2-input OR, 1 3-input AND
=> 20 transistors
delay: 3-input OR + 3-input AND + 2-input OR
Footnote: NAND would be used
in place of all ANDs and ORs.
Which is faster?
In general: Using multiple levels (more than 2) will reduce the cost. Sometimes also
delay.
Sometimes a tradeoff between cost and delay.
Spring 2012
Page 23
EECS150 - Lec02-logic-FPGA
Multi-level Combinational Logic
Another Example: F = abc + abd +a’c’d’ + b’c’d’
let x = ab y = c+d
f = xy + x’y’
Incorporates fanout.
No convenient hand methods exist for multi-level logic simplification:
a) CAD Tools use sophisticated algorithms and heuristics
b) Humans and tools often exploit some special structure (example adder)
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 24
EXOR Function
Parity, addition mod 2
x ⊕ y = x’y + xy’
x y xor xnor
00
0
1
01
1
0
10
1
0
11
0
1
Another approach:
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 25
Project platform: Xilinx ML505-110
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 26
Two Generations of ASMBL
Application-Specific Modular BLock Architecture)
FPGA: Xilinx Virtex-5 XC5VLX110T
Virtex-5 “die photo”
Spring 2012
Page 27
EECS150 - Lec02-logic-FPGA
A die is an unpackaged part
R
!"#$%&'4
Chips, 2006 slide 7
commended
B Design
les
!"#$%&')
Serial ()*
Implementing Xilinx Flip-Chip BGA Packages
From die to PC board ...
Copper Heatspreader
Thermal Interface Material
Underfill Epoxy
Flip Chip Solder Bump
Ball Grid Adhesive Epoxy*
Array
Silicon Die
(BGA)
Two Generations of ASMBL
Flip-Chip
(Application-Specific Modular BLock Architecture)
Solder Ball
Organic Build-Up Substrate
Package
Figure 2: Package Construction with Type II Lid
*Xilinx flip-chip packages are not hermetically sealed and exposure to cleaning solvents/
chemicals or excessive moisture during boards assembly could pose serious package
reliability concerns. Small vents are kept by design between the heatspreader (lid) and the
organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive
chemicals could seep through these vents and attack the organic materials and components
inside the package and hence are strongly discouraged during board assembly of Xilinx flipchip BGA packages.
Spring
2012 BGA packages, NSMD
EECS150
- Lec02-logic-FPGA
Page 28are suggested.
For
Xilinx
(Non
Solder Mask Defined)
pads on the board
Serial ()*
!"#$%&'4
This allows a clearance between the land metal
(diameter L) and the solder mask opening
!"#$%&')
(diameter M) as shown in Figure 3. The space between the NSMD pad and the solder mask,
Hot Chips, 2006 slide 7
and the
actual signal trace widths depends on the capability of the PCB vendor. The cost of the
PCB is higher when the line widths and spaces are tighter.
FPGA Overview
•
Basic idea: two-dimensional array of logic blocks and flip-flops with a means
for the user to configure (program):
1. the interconnection between the logic blocks,
2. the function of each block.
Simplified version of FPGA internal architecture:
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 29
Why are FPGAs Interesting?
• Technical viewpoint:
– For hardware/system-designers, like ASICs
only better! “Tape-out” new design every few
minutes/hours.
– Does the “reconfigurability” or
“reprogrammability” offer other advantages
over fixed logic?
– Dynamic reconfiguration? In-field
reprogramming? Self-modifying hardware,
evolvable hardware?
Spring 2012
EECS150 lec02-logic-review-FPGAs
Page 30
Why are FPGAs Interesting?
• Staggering logic capacity growth (10000x):
Year Introduced
Device
Logic Cells
“logic gate
equivalents”
1985
XC2064
128
1024
2011
XC7V2000T
1,954,560
15,636,480
– FPGAs have tracked Moore’s Law better than
any other programmable device.
Spring 2012
Page 31
EECS150 lec02-logic-review-FPGAs
Why are FPGAs Interesting?
– Logic capacity now only part of the story: on-chip RAM,
high-speed I/Os, “hard” function blocks, ...
– Modern FPGAs are “reconfigurable systems”
Xilinx Virtex-5 LX110T
10GBps Serdes
Ethernet MACs
PCI express Phy
But, the heterogeneity
erodes the “purity”
argument. Mapping is
more difficult.
Introduces uncertainty in
efficiency of solution.
64
148 36Kb SRAM Blocks
Spring 2012
EECS150 lec02-logic-review-FPGAs
Page 32
FPGAs are in widespread use
Far more designs are implemented
in FPGA than in custom chips.
FPGAs Power Net-Centric
Battlefield on Many Fronts
Xcell
INSIDE
Automotive Innovators
Hit
HitHigh
Top Gear in
Driver Assistance
with FPGA Platforms
Make MicroBlaze
Processing Roar With
Hardware Acceleration
FPGAs Help CERN Track
Particles Approaching
Speed of Light
Hardware Trumps Software
in Medical Device Design
Plugging into
High-Volume
Consumer
Products
Taming Power Draw in
Consumer MPUs
INSIDE
Algorithm Developers Power
New DA System on Xilinx
Automotive FPGA Platform
Spring 2012
Engineer Turns Blown
Engine into Hot Startup
HIGH VOLUME
How to Beat Your Son
at Guitar Hero
Using Xilinx FPGA
Multimedia for Automotive
Tips and Tricks for
Using FPGA Editor
and SystemVerilog
DESIGN TOOLS
CS 150 - Lec02-logic-FPGA
Spartan-3E: A New Era
DSP Algorithms
New ISE 7.1i Software
Control Your Designs
SERIAL I/O
Page 33
Extend Your Reach
SU
B
SC
R
IB
E
User Programmability
•
Latch-based (Xilinx, Altera, …)
+ reconfigurable
– volatile
– relatively large.
Spring 2012
• Latches are used to:
1. control a switch to make or break
cross-point connections in the
interconnect
2. define the function of the logic
blocks
3. set user options:
• within the logic blocks
• in the input/output blocks
• global reset/clock
• “Configuration bit stream” is
loaded under user control
CS 150 - Lec02-logic-FPGA
Page 34
Background (review) for upcoming
• A MUX or multiplexor is a combinational logic circuit that
chooses between 2N inputs under the control of N control
signals.
• A latch is a 1-bit memory (similar to a flip-flop).
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 35
Idealized FPGA Logic Block
• 4-input look up table (LUT)
– implements combinational logic functions
• Register
– optionally stores output of LUT
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 36
4-LUT Implementation
•
n-bit LUT is implemented as a 2n x 1
memory:
– inputs choose one of 2n memory
locations.
– memory locations (latches) are
normally loaded with values from user’s
configuration bit stream.
– Inputs to mux control are the CLB
inputs.
•
Result is a general purpose “logic
gate”.
– n-LUT can implement any function of n
inputs!
Spring 2012
Page 37
CS 150 - Lec02-logic-FPGA
LUT as general logic gate
•
An n-lut as a direct implementation of a
function truth-table.
•
Each latch location holds the value of the
function corresponding to one input
combination.
Example: 4-lut
Example: 2-lut
Implements any function of 2 inputs.
How many of these are there?
How many functions of n inputs?
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 38
FPGA Generic Design Flow
•
Design Entry:
– Create your design files using:
• schematic editor or
• HDL (hardware description languages: Verilog, VHDL)
•
Design Implementation:
– Logic synthesis (in case of using HDL entry) followed by,
– Partition, place, and route to create configuration bit-stream file
•
Design verification:
– Optionally use simulator to check function,
– Load design onto FPGA device (cable connects PC to development
board), optional “logic scope” on FPGA
• check operation at full speed in real environment.
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 39
Example Partition, Placement, and Route
•
Idealized FPGA structure:
•
Example Circuit:
– collection of gates and flip-flops
Circuit combinational logic must be “covered” by 4-input 1-output LUTs.
Flip-flops from circuit must map to FPGA flip-flops.
(Best to preserve “closeness” to CL to minimize wiring.)
Best placement in general attempts to minimize wiring.
Vdd, GND, clock, and global resets are all “prewired”.
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 40
Example Partition, Placement, and Route
OUT
IN
•
Example Circuit:
– collection of gates and flip-flops
A
A
B
B
Two partitions. Each has single output, no more than 4 inputs, and
no more than 1 flip-flop. In this case, inverter goes in both partitions.
Note: the partition can be arbitrarily large as long as it has not more
than 4 inputs and 1 output, and no more than 1 flip-flop.
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 41
Xilinx FPGAs (interconnect detail)
Spring 2012
CS 150 - Lec02-logic-FPGA
Page 42
ation-Specific Modular BLock Architecture)
Colors represent
different types
of resources:
Logic
Block RAM
DSP (ALUs)
Clocking
I/O
Serial I/O + PCI
of ASMBL
A routing fabric
runs throughout
the chip to wire
everything
Spring 2012
together.
ck Architecture)
R
Chapter 5
EECS150 - Lec02-logic-FPGA
Configurable Logic Blocks!"#$%&')
(CLBs)
!"#$%&'4
slide 7
Page 43
Serial ()*
Configurable Logic Blocks (CLBs)
CLB Overview
The Configurable Logic Blocks (CLBs) are the main logic resources for implementing
sequential as well as combinatorial circuits. Each CLB element is connected to a switch
matrix for access to the general routing matrix (shown in Figure 5-1). A CLB element
contains a pair of slices. These two slices do not have direct connections to each other, and
each slice is organized as a column. Each slice in a column has an independent carry chain.
For each CLB, slices in the bottom of the CLB are labeled as SLICE(0), and slices in the top
of the CLB are labeled as SLICE(1).
Slices define regular connections to the
switching fabric, and to slices in
CLBs above and below it on the die.
COUT
COUT
CLB
Slice(1)
Switch
Matrix
Slice(0)
CIN
CIN
UG190_5_01_122605
Figure 5-1: Arrangement of Slices within the CLB
The
LX110T has 17,280 slices.
The Xilinx tools designate slices with the following definitions. An “X” followed by a
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 44
number identifies the position of each slice in a pair as well as the column position of the
slice. The “X” number counts slices starting from the bottom in sequence 0, 1 (the first CLB
column); 2, 3 (the second CLB column); etc. A “Y” followed by a number identifies a row of
slices. The number remains the same within a CLB, but counts up in sequence from one
CLB row to the next CLB row, starting from the bottom. Figure 5-2 shows four CLBs
located in the bottom-left corner of the die.
X-Y naming convention for slices
X0, X2, ... are lower CLB slices.
Chapter 5: Configurable Logic Blocks
X1,(CLBs)
X3, ... are upper CLB slices.
Y0, Y1, ... are CLB column positions.
COUT
COUT
COUT
CLB
R
COUT
CLB
Slice
X1Y1
Slice
X3Y1
Slice
X0Y1
Slice
X2Y1
CIN
CIN
CIN
CIN
COUT
COUT
COUT
COUT
CLB
CLB
Slice
X1Y0
Slice
X3Y0
Slice
X0Y0
Slice
X2Y0
EECS150
- Lec02-logic-FPGA
Lower-left
corner
of thebetween
die.CLBs and Slices
Figure 5-2: Row and
Column Relationship
UG190_5_02_122605
Spring 2012
Page 45
Slice Description
Every slice contains four logic-function generators (or look-up tables), four storage
elements, wide-function multiplexers, and carry logic. These elements are used by all slices
to provide logic, arithmetic, and ROM functions. In addition to this, some slices support
two additional functions: storing data using distributed RAM and shifting data with 32-bit
registers. Slices that support these additional functions are called SLICEM; others are
called SLICEL. SLICEM (shown in Figure 5-3) represents a superset of elements and
connections found in all slices. SLICEL is shown in Figure 5-4.
R
Atoms: 5-input Look Up Tables (LUTs)
LUT6
1
2
A2
3
A3
4
A4
5
A5
6
A6
Q
LUT5
Q
A2
A[6:2]
D
A3
00000
A4 LUT5
A5
00001
A6
00010
Figure 3:
..
..
D
Computes any 5input logic
function.
(1)
D
1
0
1
(0)
A[6:2]
Q
(1)
..
..
D6
D5
D
WP245_03_051006
Block Diagram of a Virtex-5 6-Input LUT
6-input LUT leads to several benefits:
Timing is
independent
of function.
Q
As it implements wider functions directly in the LUT, the number of logic levels
(0)
between registers is reduced, leading to higher performance.
It implements significantly more logic than a LUT with four inputs.
Power consumption is reduced because the larger LUT reduces the amount of
172
required interconnect (routing resources).
11101
0
Q
(0)
0
11110
New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
1a much denser and faster
11111
32 x 2 distributed RAM. Benefits
for the designer are
www.xilinx.com
Virtex-5 family SLICEM LUTs also provide additional benefits:
Q
mplementation of distributed RAM with increased flexibility.
(1)
Spring
2012
Longer SRL chains. A single LUT supports
a 32-bit
SRL. A slice can thus implement EECS150 - Lec02-logic-FPGA
a shift register of up to 128 bits, providing significant area savings and reduced
routing resources in comparison to previous architectures. Shift registers are
features available only in Xilinx devices. The Xilinx ISE™ software packer
automatically packs two 16-bit SRLs with common addressing but different data.
In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it
can be implemented in a single slice.
Latches
set during
configuration.
Page 46
Virtex-5 FPGA User Guide
UG190 (v4.2) May 9, 2008
Virtex-5 6-LUTs: Composition of 5-LUTs
ExpressFabric Technology
R
May be used
as one
6-input LUT
(D6 out) ...
LUT6
A1
A2
A2
A3
A3
A4
A4
A5
A5
A6
A6
LUT5
D
... or as two
5-input LUTS
(D6 and D5)
D6
A2
A3
A4
D
LUT5
D5
A5
A6
WP245_03_051006
Figure 3:
Block Diagram of a Virtex-5 6-Input LUT
Combinational
logic
The 6-input LUT leads to several benefits:
•
The LX110T has 69,120 6-LUTs
6-LUT delay is 0.9 ns
As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.
• It implements significantly more logic than a LUT with four inputs.
• Power consumption is reduced because the larger LUT reduces the amount of
requiredSpring
interconnect
2012 (routing resources).
EECS150 - Lec02-logic-FPGA
rable Logic Blocks (CLBs)
The Virtex-5 family SLICEM LUTs also provide additional benefits:
(post configuration)
Page 47
R
•
New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster
implementation of distributed RAM with increased flexibility.
esigning Large Multiplexers
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement
a shift register of up to 128 bits, providing significant area savings and reduced
4:1 Multiplexer
routing resources in comparison to previous architectures. Shift registers are
Each Virtex-5 LUT can befeatures
configured
into only
a 4:1 in
MUX.
The
4:1 MUX
beISE™
implemented
available
Xilinx
devices.
The can
Xilinx
software packer
with a flip-flop in the same
slice. Up topacks
four 4:1
can be
implemented
in a slice,
asdifferent data.
automatically
twoMUXes
16-bit SRLs
with
common addressing
but
shown in Figure 5-21. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it
can be implemented in a single slice.
The simplest view of a slice
SLICE
Routing and Interconnect Architecture
LUTadvancements, interconnect timing delays can account for
With process technology
(D)
more than 50% of the critical
path delay. A new diagonally
symmetric interconnect
O6
4:1 MUX Output
pattern, developed for the Virtex-5 family, enhances performance by reaching more
(DQ)
Registered
places (D[6:1])
in fewer6hops. The new pattern allows
SEL D [1:0], DATA D [3:0]
D Q for more logic connections to be made
A[6:1]
Output pattern makes it easier
Input
within two or three hops. Moreover, the more regular routing
for the Xilinx ISE software to find the most optimal routes. All of the interconnect
features are transparent to FPGA designers,
but translate to higher overall
(Optional)
LUT
Four 6-LUTs
Four Flip-Flops
(C)
O6
WP245SEL
(v1.1.1)
JulyDATA
7, 2006
C [1:0],
C [3:0]
Input
(C[6:1]) 6
A[6:1]
www.xilinx.com
D Q
(CQ)
(B)
O6
SEL B [1:0], DATA B [3:0]
Input
D Q
A[6:1]
(A)
O6
CLK
(A[6:1])
6
A[6:1]
(CLK)
(BQ)
D Q
(AQ)
(Optional)
Spring 2012
Figure 5-21:
Four 4:1 Multiplexers in a Slice
Switching fabric may see
combinational and registered
outputs.
5
4:1 MUX Output
Registered
Output
(Optional)
LUT
SEL A [1:0], DATA A [3:0]
Input
Registered
Output
(Optional)
LUT
(B[6:1]) 6
4:1 MUX Output
An actual Virtex-5 slice adds
many small features to this
simplified diagram. We show
them one by one ...
4:1 MUX Output
Registered
Output
EECS150
- Lec02-logic-FPGA
UG190_5_21_050506
Page 48
CLB Overview
8:1 Multiplexer
Each slice has an F7AMUX and an F7BMUX. These two muxes combine the output of two
LUTs to form a combinatorial function up to 13 inputs (or an 8:1 MUX). Up to two 8:1
MUXes can be implemented in a slice, as shown in Figure 5-22.
Two 7-LUTs per slice ...
SLICE
LUT
O6
SEL D [1:0], DATA D [3:0]
Input (1)
(D[6:1]) 6
A[6:1]
F7BMUX
(CMUX)
LUT
O6
SEL C [1:0], DATA C [3:0]
Input (1)
SELF7(1)
CLK
(C[6:1]) 6
D Q
A[6:1]
(CQ)
8:1 MUX
Output (1)
Registered
Output
(Optional)
(CX)
Extra
multiplexers(F7AMUX,
F7BMUX)
(CLK)
Extra inputs
(AX and CX)
LUT
O6
SEL B [1:0], DATA B [3:0]
Input (2)
(B[6:1]) 6
A[6:1]
F7AMUX
(AMUX)
LUT
O6
SEL A [1:0], DATA A [3:0]
Input (2)
SELF7(2)
(A[6:1])
6
D Q
(AQ)
8:1 MUX
Output (2)
Registered
Output
A[6:1]
(Optional)
(AX)
Spring 2012
Figure 5-22:
Page 49
EECS150 - Lec02-logic-FPGA
UG190_5_22_090806
Two 8:1 Multiplexers in a Slice
Chapter 5: Configurable Logic Blocks (CLBs)
R
16:1 Multiplexer
Each slice has an F8MUX. F8MUX combines the outputs of F7AMUX and F7BMUX to form
a combinatorial function up to 27 inputs (or a 16:1 MUX). Only one 16:1 MUX can be
implemented in a slice, as shown in Figure 5-23.
Or one 8-LUTs per slice ...
SLICE
LUT
O6
SEL D [1:0], DATA D [3:0]
Input
(D[6:1]) 6
F7BMUX
A[6:1]
LUT
www.xilinx.com
x-5 FPGA User Guide
90 (v4.2) May 9, 2008
195
Third
multiplexer(F8MUX)
O6
SEL C [1:0], DATA C [3:0]
Input
SELF7
(C[6:1]) 6
A[6:1]
F8MUX
(CX)
(BMUX)
LUT
D Q
O6
SEL B [1:0], DATA B [3:0]
Input
(B[6:1]) 6
F7AMUX
A[6:1]
(B)
16:1 MUX
Output
Registered
Output
Third input
(BX)
(Optional)
LUT
O6
SEL A [1:0], DATA A [3:0]
Input
SELF7
SELF8
CLK
(A[6:1])
6
Configuring the
“n” of an n-LUT ...
A[6:1]
(AX)
(BX)
(CLK)
Spring 2012
Figure 5-23:
UG190_5_23_050506
EECS150 - Lec02-logic-FPGA
16:1 Multiplexer in a Slice
It is possible to create multiplexers wider than 16:1 across more than one SLICEM.
However, there are no direct connections between slices to form these wide multiplexers.
Fast Lookahead Carry Logic
Page 50
CLB / Slice Timing Models
R
General Slice Timing Model and Parameters
Extra muxes to chose LUT option ...
A simplified Virtex-5 slice is shown in Figure 5-25. Some elements of the Virtex-5 slice are
omitted for clarity. Only the elements relevant to the timing paths described in this section
are shown.
LUT
D
Inputs
6
O6
D
O5
DMUX
FE/LAT
Q
D
CE
CLK
DX
From eight 5-LUTs
... to one 8-LUT.
DQ
SR REV
LUT
C
Inputs
6
F7BMUX
O6
Combinational
or registered outs.
C
CMUX
O5
FE/LAT
D
CX
F8MUX
Q
CQ
CE
CLK
SR REV
LUT
B
Inputs
6
O6
B
BMUX
O5
FE/LAT
D
BX
Q
BQ
CE
CLK
SR REV
LUT
A
Inputs
6
Flip-flops unused by
LUTs can be used
standalone.
F7AMUX
O6
A
AMUX
O5
FE/LAT
Q
D
AX
AQ
CE
CLK
CE
SR REV
CLK
SR
REV
(DX)
Spring 2012
EECS150 - Lec02-logic-FPGA
UG190_5_25_050506
Figure 5-25:
Virtex-5 FPGA User Guide
UG190 (v4.2) May 9, 2008
Page 51
Simplified Virtex-5 Slice
199
www.xilinx.com
CLB Overview
R
carry multiplexer (MUXCY) can also be used to cascade function generators for
implementing wide logic functions.
Virtex 5 Verical Logic
Figure 5-24 illustrates the carry chain with associated logic elements in a slice.
COUT (To Next Slice)
Carry Chain Block
(CARRY4)
CO3
S3
O6 From LUTD
DMUX/DQ*
MUXCY
O3
O5 From LUTD
DMUX
DI3
D Q
DX
We can map
ripple-carry addition onto
carry-chain block.
DQ
(Optional)
CO2
S2
O6 From LUTC
CMUX/CQ*
MUXCY
O2
O5 From LUTC
CMUX
DI2
D Q
CX
CQ
(Optional)
CO1
S1
O6 From LUTB
BMUX/BQ*
MUXCY
O1
O5 From LUTB
BMUX
DI1
D Q
BX
BQ
(Optional)
CO0
S0
O6 From LUTA
AMUX/AQ*
MUXCY
O0
O5 From LUTA
D Q
AX
CYINIT
CIN
01
Spring 2012
Figure 5-24:
CIN (From Previous Slice)
The carry-chain block
also useful for speeding
up other adder
EECS150 - Lec02-logic-FPGA
structures and counters.
Page 52
AMUX
DI0
AQ
(Optional)
* Can be used if
unregistered/registered
outputs are free.
UG190_5_24_050506
Fast Carry Logic Path and Associated Elements
The carry chains carry lookahead logic along with the function generators. There are ten
independent inputs (S inputs – S0 to S3, DI inputs – DI1 to DI4, CYINIT and CIN) and eight
independent outputs (O outputs – O0 to O3, and CO outputs – CO0 to CO3).
The S inputs are used for the “propagate” signals of the carry lookahead logic. The
“propagate” signals are sourced from the O6 output of a function generator. The DI inputs
Chapter 5: Configurable Logic Blocks (CLBs)
R
Putting it all together ... a SLICEL.
Reset Type
COUT
The previous
slides explain all
SLICEL features.
Sync
Async
DMUX
D6
D5
D4
D3
D2
D1
A6
A5
A4
A3
A2
A1
LUT
ROM
D
D
O6
O5
DX
D
CE
CK
DX
FF
LATCH
Q
INIT1
INIT0
SRHIGH
SRLOW
SR REV
DQ
CMUX
C6
C5
C4
C3
C2
C1
A6
A5
A4
A3
A2
A1
LUT
ROM
C
O6
O5
CX
CX
D
CE
CK
About 50% of the
17,280 slices in
an LX110T are
SLICELs.
C
FF
LATCH
Q
INIT1
INIT0
SRHIGH
SRLOW
SR REV
CQ
BMUX
B6
B5
B4
B3
B2
B1
A6
A5
A4
A3
A2
A1
LUT
ROM
B
O6
O5
BX
D
CE
CK
BX
B
FF
LATCH
Q
INIT1
INIT0
SRHIGH
SRLOW
SR REV
BQ
The other slices
are SLICEMs,
and have extra
features.Page 53
AMUX
A6
A5
A4
A3
A2
A1
A6
A5
A4
A3
A2
A1
LUT
ROM
AX
SR
CE
CLK
A
O6
O5
AX
0/1
Spring 2012
D
CE
CK
A
FF
LATCH
Q
INIT1
INIT0
SRHIGH
SRLOW
SR REV
AQ
EECS150 - Lec02-logic-FPGA
CIN
Figure 5-4:
UG190_5_04_032606
Diagram of SLICEL
Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs.
In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL
and a SLICEM.
R
Recall: 5-LUT architecture ...
LUT6
174
Virtex-5 FPGA User Guide
UG190 (v4.2) May 9, 2008
www.xilinx.com
32 Latches.
Configured to 1 or 0.
A2
Q
A3
LUT5
A4
(1)
D
A5
A6
Q
A[6:2] D
A3
(0)
D6
A[6:2]
Some parts of a logic
design need many
state elements.
A2
00000
A4 LUT5
A5
00001
A6
00010
Figure 3:
..
..
D
1
0
1
Q
(1)
..
..
D5
D
WP245_03_051006
Block Diagram of a Virtex-5 6-Input LUT
6-input LUT leads to several benefits:
Q
As it implements wider functions directly in the LUT, the number of logic levels
(0)
between registers is reduced, leading to higher performance.
t implements significantly more logic than a LUT with four inputs.
Power consumption is reduced because the larger LUT reduces the amount of
equired interconnect (routing resources).
11101
0
Q
(0)
0
11110
New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
1
11111
32 x 2 distributed RAM. Benefits
for the designer are a much denser and faster
Virtex-5 family SLICEM LUTs also provide additional benefits:
Q
SLICEMs replace
normal 5-LUTs with
circuits that can act
like 5-LUTs, but can
alternatively use the 32
latches as RAM, ROM,
shift registers.
mplementation of distributed RAM with increased flexibility.
(1)
Spring 2012
EECS150 - Lec02-logic-FPGA
Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement
a shift register of up to 128 bits, providing significant area savings and reduced
outing resources in comparison to previous architectures. Shift registers are
eatures available only in Xilinx devices. The Xilinx ISE™ software packer
automatically packs two 16-bit SRLs with common addressing but different data.
n other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it
an be implemented in a single slice.
Page 54
Virtex-5 DSP48E Slice
Efficient implementation of
multiply, add, bit-wise logical.
LX110T has 64 in a
single column.
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 55
Spring 2012
EECS150 - Lec02-logic-FPGA
Page 56
ation-Specific Modular BLock Architecture)
To be continued ...
Throughout the
semester, we will
look at different
Virtex-5 features
in-depth.
Switch fabric
Block RAM
DSP48 (ALUs)
Clocking
I/O
Serial
I/O + PCI
Spring 2012
!"#$%&'4
slide 7
EECS150 - Lec02-logic-FPGA
!"#$%&')
Page 57
Serial ()*