VLSI Design Automation The Inverted Pyramid

VLSI Design Automation
Maurizio Palesi
Maurizio Palesi
1
The Inverted Pyramid
Electronic Systems > $1 Trillion
Semiconductor > $220 B
CAD $3 B
Maurizio Palesi
2
1
IC Products
„ Processors
Î CPU, DSP, Controllers
„ Memory chips
Î RAM, ROM, EEPROM
„ Analog
Î Mobile communication,
audio/video processing
„ Programmable
Î PLA, FPGA
„ Embedded systems
Î Used in cars, factories
Î Network cards
„ System-on-chip (SoC)
Maurizio Palesi
3
Example: Intel Processor Sizes
Silicon Process
Technology
1.5µ
1.0µ
0.8µ
0.6µ
0.35µ
0.25µ
Intel386TM DX
Processor
Intel486TM DX
Processor
Pentium® Processor
Pentium® Pro &
Pentium® II Processors
Maurizio Palesi
4
2
Intel Microprocessor Performance
Maurizio Palesi
5
Increasing Device and Context Complexity
„ Exponential increase in device
complexity
„ More complex system contexts
ÎSystem contexts in which devices are
deployed (e.g. cellular radio) are increasing
in complexity
Complexity
ÎIncreasing with Moore's law (or faster)!
„ Require exponential increases in design
productivity
We
Wehave
haveexponentially
exponentiallymore
moretransistors!
transistors!
Maurizio Palesi
6
3
Deep Submicron Effects
ÎCross-coupled capacitances
ÎSignal integrity
ÎResistance
ÎInductance
DSM Effects
„Smaller geometries are causing a
wide variety of effects that we
have largely ignored in the past:
Design
Designof
ofeach
eachtransistor
transistorisisgetting
gettingmore
moredifficult!
difficult!
Maurizio Palesi
7
Heterogeneity on Chip
„Greater diversity of
on-chip elements
ÎProcessors
ÎSoftware
ÎMemory
ÎAnalog
Heterogeneity
More
Moretransistors
transistorsdoing
doingdifferent
differentthings!
things!
Maurizio Palesi
8
4
Stronger Market Pressures
„Decreasing design
window
„Less tolerance for design
revisions
Time-to-market
Exponentially
Exponentiallymore
morecomplex,
complex,greater
greaterdesign
designrisk,
risk,
greater
variety,
and
a
smaller
design
window!
greater variety, and a smaller design window!
Maurizio Palesi
9
A Quadruple
Complexity
Heterogeneity
Time-to-market
DSM Effects
Maurizio Palesi
10
5
How Are We Doing?
58% / Yr. compound
complexity growth rate
1,000,000
gic
Lo
100,000
hip
.r /C
T
100,000,000
10,000,000
1,000,000
100,000
10,000
Productivity
gap
1,000
.M
Tr./S
10,000
1,000
100
21% / Yr. compound
productivity growth rate
10
100
Productivity
Trans. / Staff . Month
Logic transistors per chip
(K)
10,000,000
2009
2005
2001
1997
1993
1989
1985
1981
10
Role of EDA: close the productivity gap
Maurizio Palesi
11
Trends
„ By 2006, an inexpensive 200mm2 die would have 75M logic
transistors
„ A 1000mm2 die would have 400M logic transistors
Core
Trans. Count
486DX4
0.7 million
Pentium/MMX
2.8 million
MPEG-2 encoder
1.5 million
MPEG-2 decoder
0.5 million
8051 microcontroller
0.05 million
Maurizio Palesi
„ Up to 7500 8051 microcontrollers
on a single chip!
„ 150 – 750 cores on a single chip
Pentium/MMX
12
6
IC Design Steps
Specifications
Specifications
High-level
High-level
Description
Description
Functional
Functional
Description
Description
Behavioral
VHDL, C
Structural
VHDL
Maurizio Palesi
13
IC Design Steps
Specifications
Specifications
High-level
High-level
Description
Description
Functional
Functional
Description
Description
Synthesis
Physical
Design
Placed
Placed
&&Routed
Routed
Design
Design
Packaging
Maurizio Palesi
Technology
Mapping
Gate-level
Gate-level
Design
Design
Fabrication
Logic
Logic
Description
Description
X=(AB*CD)+
(A+D)+(A(B+C))
Y = (A(B+C)+AC+
D+A(BC+D))
14
7
Design of Microelectronic Circuits
Design
Idea
Fabrication
Mask
Fabrication
Modeling
Synthesis and
Optimization
Wafer
Fabrication
Validation
Packaging
Testing
Tester
Slicing
1000010001
0010101010
1100101010
Packaging
1101110001
Maurizio Palesi
15
Circuit Models
„A model of a circuit is an abstraction
ÎA representation that shows relevant features
without associated details
Circuit
CircuitModel
Model
(few
(fewdetails)
details)
Maurizio Palesi
Synthesis
Synthesis
Circuit
CircuitModel
Model
(many
(manydetails)
details)
16
8
Model Classification
Models
Levels of Abstractions
Circuit Views
Architectural
Behavioral
Logic
Structural
Geometrical
Physical
Maurizio Palesi
17
Levels of Abstraction
„Architectural
ÎA circuit performs a set of operation, such as
data computation or transfer
9HDL models, Flow diagrams, …
„Logic
ÎA circuit evaluate a set of logic functions
9FSMs, Schematics, …
„Geometrical
ÎA circuit is a set of geometrical entities
9Floor plans, layouts, ...
Maurizio Palesi
18
9
Levels of Abstraction
Models
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
Levels of Abstractions
Architectural
Design consists of
refining the abstract
specification of the
architectural model into
the detailed geometricallevel model
Logic
Geometrical
Maurizio Palesi
19
Views of a Model
„Behavioral
ÎDescribe the function of a circuit regardless of
its implementation
„Structural
ÎDescribe a model as an interconnection of
components
„Physical
ÎRelate to the physical object (e.g., transistors)
of a design
Maurizio Palesi
20
10
The Y-chart
Structural-view
Behavioral-view
Architectural-level
Logic-level
Geometrical-level
Gajski and Kuhn’s Y-chart
(Silicon Compilers, Addison-Wesley, 1987)
Physical-view
Maurizio Palesi
21
The Y-chart
Structural-view
Behavioral-view
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
MULT
CTRL
Architectural
level
ADD
RAM
S0
S3
Logic level
S1
S2
Geometrical
level
Physical-view
Maurizio Palesi
22
11
Synthesis
Behavioral-view
High-level synthesis
(or architectural synthesis)
Structural-view
Assignment to resources
Interconnection
‹ Scheduling
‹
Architectural-level
‹
Logic synthesis
‹ Interconnection of istances
of library cells (technology
mapping)
Logic-level
Physical design
Geometrical-level
‹ Physical layout of the chip
(placement, routing)
Physical-view
Maurizio Palesi
23
Architectural Synthesis
„ Identify the resources that can implement the operations
„ Scheduling the execution time of the operation
„ Binding them to the resources
Control
ControlUnit
Unit
Behavioral
Behavioral
Architectural-level
Architectural-level
Circuit
CircuitModel
Model
Architectural
Architectural
Synthesis
Synthesis
Datapath
Datapath
Maurizio Palesi
24
12
Architectural Syntesis (Example)
„ Solve numerically (by means of the forward Euler method) the
differential equation y’’+3xy’+3y=0 in the interval [0,a] with step-size dx
and initial values x(0)=x; y(0)=y; y’(0)=u.
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
Behavioral View
Maurizio Palesi
25
Architectural Syntesis (Example)
„ Let us assume that the data path of the circuit contains two resources:
Î 1 multiplier
Î 1 ALU (add, sub, comparison)
MUL
ALU
Steering
&
Memory
Structural View
Control
Unit
Data path
Maurizio Palesi
26
13
Logic Synthesis
HDL
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
FSM
S0
S3
S1
Logic
Logic
Synthesis
Synthesis
S2
Schematic
MULT
Gate-level netlist
CTRL
ADD
RAM
Maurizio Palesi
27
Logic Synthesis (Example)
MUL
ALU
Steering
&
Memory
Reset state
(reading the data)
Control
Unit
S1
r’
*
Writing the data
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
Maurizio Palesi
r
S9
r
cr’
r
r
S8
r’
r’
r
c’r’
r
S2
S3
r
r’
S7
S4
r’
r’
S6
S5
r’
28
14
Logic Synthesis (Example)
S1
r’
*
r
S9
r
r
c’r’
cr’
S8
r
r
r
r’
S2
r’
r
r’
S7
S4
r’
r’
S6
S5
r’
r
c
λ
State
Maurizio Palesi
δ
To steering
& memory module
From steering
& memory module
Logic
Logic
Synthesis
Synthesis
S3
29
Optimization
„ Quality measures
Î Performance
9 Combinational logic circuits
– Propagation delay through the critical path [sec]
9 Synchronous sequential circuits
–
–
–
–
Cycle time [sec]
Latency [clock cycles]
Execution time = Latency*Cycle time
Throughput (for pipeline organization)
Î Area
9 Logic gates, registers, wiring
Î Power
9 Energy
– Battery life, system weight, ...
9 Power
– Packaging, reliability, cost, ...
Maurizio Palesi
30
15
a
Optimization
MUL
Steering
&
Memory
ALU
Control
Unit
x
y
u
dx
3
+
*
x1
3x
<
*
c
udx
y
+
*
u
3xudx
3
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
y1
-
*
dx
3y
*
3ydx
-
6 clock cycles
u1
Maurizio Palesi
31
Optimization
x
y
u
dx
3
MUL
ALU
MUL
ALU
Steering
&
Memory
Control
Unit
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
*
+
3x
x1
*
3 udx
a
y
*
<
*
+
3xudx
c
3y
y1
-
*
u-3xudx
3ydx
4 clock cycles
u1
Maurizio Palesi
32
16
Design Space
„ Parameters
Î # of ALU (max 2), # of Multiplier
(max 2)
Area
„ Design space
15
14
13
12
11
10
9
8
7
Î System A = 1 alu, 1 multiplier
Î System B = 1 alu, 2 multiplier
Î System C = 2 alu, 1 multiplier
Î System D = 2 alu, 2 multiplier
„ Let us assume
Î Multiplier = 5 units of area
Î ALU = 1 unit of area
Î Control Unit + Steering logic = 1 unit
of area
D
B
Pareto points
C
A
1
2
3
4
5
6
Dominated
7
Latency
(clock cycles)
Maurizio Palesi
33
Trends
„ By 2006, an inexpensive 200mm2 die would have 75M logic
transistors, or 375M SRAM transistors
„ A 1000mm2 die would have 400M logic transistors
Core
Trans. Count
486DX4
0.7 million
Pentium/MMX
2.8 million
MPEG-2 encoder
1.5 million
MPEG-2 decoder
0.5 million
8051 microcontroller
0.05 million
Maurizio Palesi
„ Up to 7500 8051 microcontrollers
on a single chip!
„ 150 – 750 cores on a single chip
Pentium/MMX
34
17
The Evolution of SoC Platforms
General-purpose
Scalable RISC
Processor
• 50 to 300+ MHz
• 32-bit or 64-bit
Library of Device
IP Blocks
• Image
coprocessors
• DSPs
• UART
• 1394
• USB
Scalable VLIW
Media Processor:
• 100 to 300+ MHz
• 32-bit or 64-bit
Nexperia™
System Buses
• 32-128 bit
2 Cores: Philips’ Nexperia PNX8850 SoC platform for High-end digital video (2001)
Maurizio Palesi
35
Running Forward…
„ Four 350/400 MHz StarCore SC140
DSP extended cores
„ 16 ALUs: 5600/6400 MMACS
„ 1436 KB of internal SRAM & multilevel memory hierarchy
„ Internal DMA controller supports 16
TDM unidirectional channels,
„ Two internal coprocesssors (TCOP
and VCOP) to provide specialpurpose processing capability in
parallel with the core processors
6 Cores: Motorola’s MSC8126 SoC platform for 3G base stations (late 2003)
Maurizio Palesi
36
18
What´s Happening in SoCs?
„ Technology: no slow-down in sight!
ÎFaster and smaller transistors: 90 → 65 → 45 nm
Î… but slower wires, lower voltage, more noise!
9 80% or more of the delay of critical paths will be due to
interconnects
„ Design complexity: from 2 to 10 to 100 cores!
ÎDesign reuse is essential
Î…but differentiation/innovation is key for winning on the
market!
„ Performance and power: GOPS for MWs!
ÎPerformance requirements keep going up
Î…but power budgets don’t!
Maurizio Palesi
37
The Deep Submicron Effects
Maurizio Palesi
38
19
Communication Architectures
„Shared bus
IP
ÎLow area
ÎPoor scalability
ÎHigh energy consumption
IP
IP
Shared bus
IP
IP
IP
„Network-on-Chip
ÎScalability and modularity
ÎLow energy consumption
ÎIncrease of design complexity
IP
IP
IP
IP
IP
IP
IP
IP
IP
Maurizio Palesi
39
Design Space Exploration for NoC
Ogras et al., ASAP’05
Maurizio Palesi
40
20