an311.pdf

ASIC to FPGA Design
Methodology & Guidelines
July 2003, ver. 1.0
Application Note 311
Introduction
The cost of designing ASICs is increasing every year. In addition to the
non-recurring engineering (NRE) and mask costs, development costs are
increasing due to ASIC design complexity. Issues such as power, signal
integrity, clock tree synthesis, and manufacturing defects can add
significant risk and time-to-market delays. To overcome the risk of respins, high NRE costs, and to reduce time-to-market delays, FPGAs offer
a viable and competitive option to ASIC development.
Programmable logic has progressed from being used as glue logic to
today’s FPGAs, where complete system designs can be implemented on
a single device. The number of gates and features has increased
dramatically to compete with capabilities that have traditionally only
been offered through ASIC devices. Figure 1 illustrates the evolution of
FPGA applications that has led to higher density devices, intellectual
property (IP) integration, and high-speed I/O interconnects technology.
All of these elements have allowed FPGAs to play a central role in digital
systems implementations. With the availability of multimillion-gate
FPGA architectures, and support for various third-party EDA tools,
designers can use a design flow similar to that used for ASIC devices to
create system-on-a-programmable-chip (SOPC) designs in FPGAs.
This document addresses the FPGA design guidelines and
methodologies that can be applied for better FPGA device performance
and area utilization. Additionally, this document explains new features,
such as SOPC Builder and the MegaWizard® Plug-In Manager, both
available in the Altera® Quartus® II software tools, that are used to
facilitate and simplify a true system on a programmable chip solution.
And finally, this document draws analogies, pro and con, between typical
FPGA and ASIC design processes.
Altera Corporation
AN-311-1.0
1
Preliminary
ASIC to FPGA Design Methodology & Guidelines
ASIC & FPGA Design Flows
Figure 1. Application of FPGA Devices from 1985 to the Present
SOPC
Design
Increasing
PLD
Complexity
Complex
Control
Control
Logic
Glue
Logic
1985
ASIC & FPGA
Design Flows
2
Preliminary
Block-based
Design
Second-generation Synthesis
IP Megafunctions
Synthesis
Macrofunctions
Equations
Schematics
1990
1995
2003
Typical ASIC and FPGA design flows are shown in Figure 2. The backend design of an ASIC device involves a wide variety of complex tasks,
including placement and physical optimization, clock tree synthesis,
signal integrity, and routing using different EDA software tools. When
compared to ASIC devices, the physical, or back-end, design of FPGA
devices is very simple and is accomplished with a single software tool,
the Quartus II software. This not only reduces the complexity of the
design process, but also significantly reduces cost. This document
discusses and compares each of the tasks involved in the design flows, for
both FPGA and ASIC devices.
Altera Corporation
ASIC & FPGA Design Flows
ASIC to FPGA Design Methodology & Guidelines
Figure 2. A Comparison of ASIC & FPGA Design Flows
Tasks
RTL
(Verilog HDL
IP Instantiator)
Timing
and
Area
Optimization
Place & Route,
Re-synthesis, I/O
Placement,
Timing Analysis
FPGA
Design Flow
ASIC
Design Flow
Design
Specification
Design
Specification
Design
Development
Design
Development/
Physical and
Power Planning
Functional
Simulation
Functional
Simulation
Synthesis
(Area and
Timing)
Synthesis/Physical
(Area, Power, and
Timing)
Timing and
Power Analysis
Test Synthesis
(Scan Insertion,
BIST Synthesis)
Formal
Verification and
Timing Analysis
Placement and
Physical
Optimization
Formal
Verification and
Post-Layout
Timing Analysis
Floor
Planning and
Place & Route
Clock Tree
Synthesis
Routing
Signal
Integrity
Tasks
RTL
(Verilog HDL
IP Instantiator)
Skew and Timing
Analysis
DRC/ERC, Manual
Layout Fixes,
Hold Time Fixes
IR drops, X-talk
Sign-off
Altera Corporation
3
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design
Specification
Design Specification
Design specification consists of defining specifications, including the
following:
■
■
■
■
■
I/O standards, the number of I/O pins, and the location of the I/O
pins for various standards
Global clocks and their frequency requirement
Memory requirements
Test methodology
Selection of the FPGA family and device, including the speed grade
You must define each of the above specifications, because every FPGA
family supports different I/O standards, and the devices within a family
have different numbers of I/O pins.
I/O Specification
Selection of an FPGA device depends on considerations such as speed
grade, signaling technology, termination technology, available logic
elements (LEs), external reset, the number of global clock networks, and
most importantly, the number of I/O pins available and the supported
I/O standards. I/O resources vary depending upon the device and
family. To meet today’s challenges of design complexity, and also to
support complex I/O standards, Altera devices support a wide variety of
I/O standards.
The ASIC design methodology consists of instantiating I/O pads for a
design by specifying the technology I/O buffers in a Verilog HDL or
VHDL file to perform simulation and synthesis. The I/O file will be a toplevel file, and it instantiates the top-level design, as shown in Figure 3.
4
Preliminary
Altera Corporation
Design Specification
ASIC to FPGA Design Methodology & Guidelines
Figure 3. I/O Specification, with the Top-Level Design
io.v/vhd
oe
oe
top.v/vhd
oe
Module A
Module B
Module C
Module D
oe
oe
oe
oe
After the design is handed over to the foundry, the io.v and io.vhd files
are taken out and replaced with the technology I/O pads.
In FPGA design flow, third-party synthesis tools provide the option of
synthesizing the design with or without I/O pads. Synthesizing with I/O
pads provides a good estimate of the delay through the I/O buffers.
Actual I/O assignment is done within the Quartus II software, using the
Assignment Editor (Assignments menu), as shown in Figure 4. The
designer is given the option for the synthesized design, including I/O
pads, to be replaced by a specific I/O standard within the Quartus II
software, or a specific I/O standard can be added to the top level design.
These choices are both presented before place-and-route.
f
Altera Corporation
For additional information, see the Using the Assignment Editor in the
Quartus II Software white paper.
5
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Specification
Figure 4. Quartus II I/O Assignment Editor
f
Altera FPGA devices support a wide variety of I/O standards. For
information on the supported I/O standards in a specific Altera device,
see the device-specific Altera data sheet.
Number of I/O Pins
You must determine the exact number of I/O pins used in the design.
This number will ultimately be used to decide the type of package and the
total number of pins required out of the package.
Location of I/O Pins
To minimize potential problems pertaining to board layout, I/O pin
location must be given careful consideration. Incorporating high-speed
signals in a board design requires minimizing via interconnects and
cross-over signals to simplify the routing process. In an FPGA flow,
because I/O banks support specific I/O standards, a designer must know
the different I/O standards that the device can support so that modules
in the design that require a certain I/O standard can be physically placed
near the corresponding I/O bank.
6
Preliminary
Altera Corporation
Design Specification
f
ASIC to FPGA Design Methodology & Guidelines
To learn more about the location of I/O banks and supported standards
within an I/O bank, see the device-specific Altera data sheets.
I/O Timing
I/O timing is affected by the I/O standard and the drive strength. To
meet I/O requirements, Altera FPGAs have I/O pins with registers. The
device uses these registers depending on the clock setup time (tSU ) and
clock-to-output delay (tCO) requirements. These registers are enabled by
setting constraints within the Quartus II software at the top level of the
design hierarchy, such as the Fast Input Register and Fast Output
Register, to determine register placement for I/O timing.
Number of Global Clocks
It is necessary to identify the number of clocks needed in a design in order
to determine what type of FPGA can be used. There is usually an upper
limit to the number of different clocks that can exist in a design. Clock tree
insertion is a manual process in an ASIC flow, whereas in an FPGA flow
it is an automatic process.
f
For more information on clock tree synthesis and clock requirements,
refer to “Clock Tree Synthesis” on page 42.
Memory Requirements
Identifying embedded dynamic random access memory (DRAM) and
SRAM capacity and speed requirements is an integral part in the
technology selection process and needs to be accomplished early in the
process. Stratix devices have up to 4 Mbits of embedded memory within
various sizes of memory blocks that can be used in different kinds of
applications. Memory capacity is higher in ASIC technology, but
integrating and testing this memory in an ASIC device requires
significantly more work. In an FPGA flow, Quartus II software tools
allow users to automatically arrange memory blocks to meet the system
requirement, and no user instantiation is necessary. In an ASIC flow,
integrating and testing embedded memory usually involves instantiating
a special module in the register transfer level (RTL) design, as well as
creating custom circuitry around the RAM blocks for testing.
f
Altera Corporation
For more information on memory requirements and available
configurations in an FPGA, refer to “Specification of External & Internal
Memory” on page 26.
7
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Specification
Phase-Locked Loop (PLL) Requirements
Identifying the number of PLLs needed in the design impacts the type
and family of devices that can be selected in FPGA design flows. Inserting
PLL circuitry in an ASIC device is typically a manual process
accomplished by instantiating special PLL blocks in the design. PLLs are
instantiated in the FPGA design after creating a configurable PLL with
the MegaWizard Plug-In Manager tool available within the Quartus II
software. This process is usually automatic in the Quartus II software.
The number of PLLs available in FPGA technology is usually limited,
whereas PLL quantities are virtually unlimited in ASIC technology.
Stratix devices support up to 12 PLLs.
f
For more information on the number and type of PLLs supported in a
design, refer to the device-specific Altera data sheets.
Test Methodology
ASIC testing and fault coverage is an important aspect of the ASIC
development process. Scan insertion, built-in self-test (BIST), signature
analysis, IDDQ, and automatic test pattern generation (ATPG) all have to
be considered and analyzed within an ASIC design flow. ASIC testing
typically involves test vector generation, using ATPG tools to test the
device for manufacturing defects under the “single stuck-at” model.
ATPG is usually performed after completion of scan insertion. Scan
insertion is performed to improve the obtained fault coverage by
reducing the sequential testing problem to a combinatorial testing
problem. Using FPGA technology, you do not need to worry about any
device testing, because FPGA devices are pre-verified at the factory. In
these devices, most logic functions are exhaustively tested by
downloading different images into the device. Therefore, compared to
ASIC devices, FPGA devices are much more likely to be defect-free after
customer delivery.
Unlike FPGAs, where the top-level I/O pads can be synthesized by the
synthesis tools, and boundary scan is already incorporated for test
purposes, ASIC design flow generally requires extensive manual effort to
insert and simulate boundary scan on top of the device logic, often using
third-party EDA tools. Also, unlike FPGAs, ASIC test methodology
requires sequential propagation of test vectors through all flip-flops of a
design to identify single stuck-at-faults after device manufacturing. Due
to a large number of flip-flops, the scan chain is broken down into many
manageable chains, such that several chains can be tested in parallel on a
tester. Due to a high number of scan chains and the limited number of
I/O pins available on a package, an ASIC’s I/O pins are manually
multiplexed with scan chain outputs so that the same pins can be used as
functional pins in regular operation mode, and scan chain output pins in
test mode.
8
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Clock Frequencies
The circuit’s speed will be another deciding factor in choosing the
appropriate Altera FPGA for your design.
Number of Simultaneously Switching Outputs (SSOs)
The number and the placement of SSOs has a direct impact on the number
of power and ground pins that an ASIC requires. This is usually not an
issue with an FPGA, because the placement and number of power and
ground pins is fixed and assigned through the Quartus II software.
FPGA Device Sizing
You must estimate the potential size of the logic required so that power
and heat dissipation requirements can be estimated. After the size of the
logic is estimated and operating frequencies are recognized, the
approximate size and speed grade of the device can be selected.
Altera FPGAs also support vertical migration within the same package.
In this context, you can migrate between devices whose dedicated pins,
configuration pins, and power pins are the same for a given package
across device densities. For example, you can migrate a design from an
EP1S10B672C6 device to an EP1S20B672C6 device in the 672-pin package.
Power Requirement
Perform preliminary power analysis, based on the estimated logic size
and speed, to determine the device power requirement. This usually
leads to defining the device’s cooling and package requirements.
Design
Development
Design development is comprised of the following processes for both
FPGAs and ASICs.
■
■
■
■
■
Altera Corporation
Specification of the methodology (top-down or bottom-up)
RTL coding guidelines for better performance
Hierarchical design partitioning
Specification of the external and internal memory
IP availability and flow
9
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Specification of the Methodology
FPGA design methodology supports both top-down and bottom-up
design methodology. FPGA design flows support modular design
approaches for bottom-up methodology, and hierarchical design
partitioning for top-down design methodology, similar to the process
used for ASIC devices.
f
For more information on design planning and different design
approaches, see the following documents.
■
■
■
AN 226: Synplify & Quartus II Design Methodology
AN161: Using the LogicLock Methodology in the Quartus II Design
Software
AN 165: Synplify & Quartus II LogicLock Design Flow
RTL Coding Guidelines
Following simple design guidelines can improve device performance and
design debugging capabilities. This section explains the following
common design guidelines, which improve device performance by
taking advantage of the FPGA architecture.
■
■
■
■
■
■
■
■
■
■
■
Synchronous versus asynchronous designs
Synchronous versus asynchronous resets
Gated clocks versus clock enables
Using clock dividers
Efficient use of clock-enables
Using Altera PLLs to reduce clock skew
Using data pipelining
Using encoding schemes
Using look-ahead techniques
Using logic duplication
Using internal buses
Synchronous vs. Asynchronous Designs
The two RTL code examples in this section show the differences between
synchronous and asynchronous counters.
Synchronous Design
In synchronous designs, all registers are clocked by one global clock, a
primary input to the design. Synchronous designs are free from
combinatorial feedback loops and delay lines. The following RTL code is
an example of a synchronous counter.
10
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
module sync_counter (clk,count_en,reset,count_out);
input clk,count_en,reset;
output [3:0] count_out;
reg [3:0] count_out;
always @(posedge clk) begin
if (~reset)
count_out = 4’b0000;
else if (count_en == 1’b1) begin
if (count_out == 4’b1111)
count_out = 4’b0000;
else
count_out = count_out + 1’b1;
end
end
endmodule
Asynchronous Design
Asynchronous behavior occurs when a signal passes between two
different clock domains due to different clock frequencies, gated clocks,
or a modular place-and-route approach where large blocks of logic
belonging to the same clock domain are not synchronized. The worst
problem caused by an asynchronous signal is potential meta-stability in
the destination register, because the signal state can be changing at the
same time as the active clock edge of the destination register. You can
avoid this meta-stability hazard by putting two extra stages of flip-flops
before the destination register. If the signal does cause meta-stability,
stabilize it by passing it through the two extra flip-flops before it reaches
its intended register. The other problem caused by asynchronous signals
is that the sampling clock at the receiving end may be too slow compared
to the initiating clock, which causes the signal to be completely missed. In
this scenario, a good handshake protocol must be implemented between
asynchronous blocks of a device to ensure proper device operation.
Figure 5 shows an example of an asynchronous design with multiple
clocks asynchronous to each other, and the signal passing from one clock
domain to another.
Altera Corporation
11
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Figure 5. An Asynchronous Design with Two Clocks
D
Q
CLK
clk 1
Note (1)
D
Q
CLK
Q
Q
clk 2
Note to Figure 5:
(1)
Clk1 and Clk2 have different clock sources.
Typical design styles that contribute to asynchronous circuits include the
following:
■
■
■
■
Gated clocks
Latch inferences
Multiple clocks
Derived clocks
One of the disadvantages of asynchronous logic is that the existence of
race conditions is an inherent problem with this type of logic. An example
of an asynchronous counter is shown in the RTL code below.
module async_counter(clk,count_en,reset,count_out);
input clk,count_en,reset;
output [3:0] count_out;
reg [3:0] count_out;
wire gated_clk = clk & count_en;
always @(posedge gated_clk or negedge reset) begin
if (~reset)
count_out <= 4’b0000;
else begin
if (count_out == 4’b1111)
count_out <= 4’b0000;
else
count_out <= count_out + 1’b1;
end
end
endmodule
12
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Synchronous vs. Asynchronous Resets
An asynchronous reset is defined as a way of clearing the contents of a
register, independent of the associated clock. ASIC libraries consist of
registers with and without a built-in reset/clear pin. A register with a
built-in reset/clear pin is bigger than a register without one.
ASIC designers like to use registers with no built-in reset pins to obtain
extra speed and area in a design by using an external gate on the data path
of the register for a reset. When the reset is routed through the data pin,
the clock must be running when the reset is asserted. Conversely, a
synchronous reset signal is treated as any other data signal, so that no
extra care is needed during the routing and timing optimization phase.
In ASIC designs, internally generated asynchronous resets from a state
machine can cause problems in scan testing. A typical problem results
from the shifting of test vectors through the flops of the state machine,
which triggers unintended resets. This is not an issue for FPGAs, which
are not scan tested.
Instead of gating the reset signal, if ASIC designers use registers with a
built-in reset/clear pin, then asynchronous reset is controlled by a
device’s primary input and buffered like a clock tree to reach all of the
asynchronous reset pins of all of the device registers. In ASICs, a reset tree
must be generated just like a clock tree. Use timing analysis or an
independent start signal to prevent different state machines from getting
out-of-sync immediately after the reset release.
In FPGAs, a reset tree is already in place. All registers have a built-in
asynchronous reset, so no area savings are attained by not utilizing this
reset capability. Safe design practice must be followed to ensure
everything is in synch immediately after the reset signal is released.
Gated Clocks vs. Clock Enables
Clock gating is used in ASIC designs for power optimization. For
example, if data has to be written to memory only when the write enable
signal is high, then the clock can be gated with the write enable so that
switching of the register takes place only when write enable is high.
Gated clocks can introduce glitches and clock incorrect data due to the
delay associated along the combinatorial path. The following RTL code is
an example of clock gating, and the corresponding schematic is shown in
Figure 6.
Altera Corporation
13
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
module gated_clk(in,out,en,clk);
input in,clk,en;
output out;
reg out;
wire gate_clk;
assign gate_clk = clk & en;
always @(posedge gate_clk) begin
out <= in;
end
endmodule
Figure 6. Clock Gating
D
Q
Clk
Clk
En
gate_clock
The timing diagram shown in Figure 7 shows the glitch associated with
clock gating.
Figure 7. Issues Associated with Clock Gating
Clk
En
Clk_en
Output
14
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
FPGAs have a clock enable pin that can be used to avoid the gate clocking.
Using the clock enable pin to disable the clocking of the flip-flop does not
mean that the flip-flop is not getting clocked. It merely means that the
current state of the flip-flop is clocked continuously. An example of this
implementation of the flip-flop is shown in Figure 8.
Figure 8. A Clock Enable Pin Disabling Clocking of a Flip Flop
DFF Cell
Din
D
Q
Q
CE
clk
clk
The following RTL code illustrates the modifications to the Figure 8
example, using the clock enable pin.
module clock_en(in,out,clk,data_en);
input in,clk,data_en;
output out;
reg out;
always @(posedge clk) begin
if (data_en)
out <= in;
else
out <= out;
end
endmodule
Figure 9 shows the schematic for this clock enable code.
Figure 9. FPGA Devices’ Built-in Clock Enable Resource
in
D
data_en
CE
clk
Altera Corporation
Q
out
CLK
15
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Using Clock Dividers
Feeding the Q bar output to the data input is a widely used concept of
clock division, known as ripple clock division. The following RTL code is
an example of this type of divider.
module ripple_clk_divider(clk,data_in,data_out);
input clk,data_in;
output data_out;
reg data_out;
reg temp_clk;
always @(posedge clk)
temp_clk <= ~temp_clk;
always @(posedge temp_clk)
data_out <= data_in;
endmodule
Figure 10 shows the schematic for this code example.
Figure 10. Ripple Clock Division
D
Q
data_in
D
Q
CE
temp_clk
clk
CLK
Q
CLK
Q
The clock enable resource is used to divide a clock to reduce the ripple
effect by synchronous clock division, as shown in the following RTL code
example.
16
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
module clock_divider_ce(clk,data_in,data_out);
input clk,data_in;
output data_out;
reg data_out;
reg temp_clk;
always @(posedge clk)
temp_clk <= ~temp_clk;
always @(posedge clk) begin
if (temp_clk)
data_out <= data_in;
end
endmodule
Figure 11 shows the schematic for this code.
Figure 11. Synchronous Clock Division
D
Q
data_in D
Q
temp_clk CE
clk
CLK
Q
CLK
Q
Efficient Use of Clock Enables
There are situations in which the combinatorial logic that drives a
register’s clock enable pin might be in the critical path. Clock enables are
not accessible via the look-up table (LUT) in the same LE. Thus, if a logic
function drives the clock enable of a flip-flop, then the clock enable is
implemented in an external LUT with an associated logic and routing
delay. To decrease the number of layers of logic, it is sometimes possible
to implement the same clock enable in the data path with a multiplexer.
The following example shows RTL code that will reduce the number of
logic levels.
Altera Corporation
17
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
module
inefficient_ce(clk,datain,rwn,csn,asn,dataout);
input
clk,datain,rwn,csn,asn;
output
dataout;
reg
dataout;
always @(posedge clk)
begin
if ((rwn == 1’b0) && (csn == 1’b0) && (asn ==
1’b0))
dataout <= datain;
end
endmodule
Figure 12 shows the schematic for the above code.
Figure 12. RTL Modification to Reduce the Number of Logic Levels
S
csn
Q
D
CE
dataout
clk
R
datain
in
out
in(0)
in(1)
rwn
asn
in
out
GND
out
in(0)
in
out
in(1)
out
Using the following RTL code, the number of logic levels that drive the
register’s clock enable can be reduced from one to zero LUTs.
18
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
module efficient_ce(clk,datain,rwn,csn,asn,dataout);
inputclk,datain,rwn,csn,asn;
outputdataout;
reg
dataout;
always @(posedge clk)
begin
if (csn == 1’b0)
dataout <= (datain && (rwn == 1’b0) && (asn
== 1’b0))
|| (dataout && rwn == 1’b1)
|| (dataout && asn == 1’b1);
end
endmodule
Figure 13 shows the schematic for this RTL code.
Figure 13. Reduction of Logic Levels from One to Zero LUTs
dataout
S
in
csn
D
out
clk
R
in(0)
datain
rwn
Q
CE
in(0)
out
in(1)
in
in(0)
out
in(0)
asn
in(1)
in(0)
in
out
1
Altera Corporation
in(1)
in(1)
in(1)
out
out
GND
in(1)
in(0)
out
out
Unless gate clocking is the only alternative, implement the
design taking advantage of the clock enable resource.
19
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Using PLLs to Reduce Clock Skew
Altera PLLs can provide multiple clock sources to be used off-chip with
the minimum possible skew. The RTL implementation shown below, and
the schematic representation shown in Figure 14, describes how to drive
eight pins that are three times the input clock. The PLL is used to multiply
the input clock frequency by six, and a toggle flip-flop is used to obtain
three times the input frequency. The Quartus II software MegaWizard
Plug-in Manager generates the instantiated module pllx2, shown in
Figure 14.
module off_chip(clk100in,clk300out);
inputclk100in;
wireclk600;
output[7:0]clk300out;
reg
[7:0]clk300out;
reg
clk300int /* synthesis syn_preserve = 1 */;
pllx2 U1(
.inclock(clk100in),
.clock1(clk600));
always @(posedge clk600)
begin
clk300int <= ~clk300int;
end
always @(posedge clk600)
begin
clk300out[0]<=~clk300int;
clk300out[1]<=~clk300int;
clk300out[2]<=~clk300int;
clk300out[3]<=~clk300int;
clk300out[4]<=~clk300int;
clk300out[5]<=~clk300int;
clk300out[6]<=~clk300int;
clk300out[7]<=~clk300int;
end
endmodule
20
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 14. Altera PLL as a Source to Multiple Clocks
To External
Device
Altera FPGA
clk300out[7..0]
External Clock
Source
S
pllx2
clk100in
inclock
D
Q
clock1
R
1
Unless gate clocking is the only alternative, implement the
design taking advantage of the clock-enable resource.
Using Data Pipelining
FPGAs have many registers because there is a flip-flop available in each
logic element, and these registers can be used to improve the device
performance without incurring any area penalty. Data pipelining is an
important concept that can be used to increase device performance by
inserting registers to break a long combinatorial path. Figure 15 shows
the concept of pipelining.
Figure 15. Illustration of Data Pipelining
D
Q
Combinatorial
Logic
D
Q
Q
1
Altera Corporation
Q
D
Q
Q
Combinatorial
Logic 1
D
Q
Q
Combinatorial
Logic 2
D
Q
Q
Even though data pipelining increases design performance, be
sure to equalize data and control path latency, and modify the
test benches to accommodate signal capture from the
introduced registers.
21
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Using Encoding Schemes
State machines are common in almost all present-day SOPC designs. A
state machine can be implemented by different encoding methods, such
as the commonly used one-hot binary encoding method. One-hot
encoding requires a storage element for each state, but in binary encoding
“n” storage elements can be used to represent 2n states. Decoding a highly
encoded binary state machine usually results in increased logic levels
between states, compared to a simple one-hot encoded state machine. The
advantage of one-hot encoding is that the states are already in the
decoded format and require a simple decoding logic, resulting in a
reduced number of logic levels between states.
1
Because FPGAs have many registers and also because one-hot
encoding produces better performance results for most designs,
implement the finite state machines (FSMs) with one-hot
encoding.
Using Look-Ahead Techniques
Look-ahead techniques force a portion of the large combinatorial logic
function into the previous clock cycle, and forces the remaining logic
function to be performed in next clock cycle. This technique, also know as
register balancing, balances the levels of logic between registers, resulting
in much faster execution. Thus, by splitting the combinatorial logic over
two clock cycles, both cycles will run faster and no latency is introduced
in the overall design, as shown in the following RTL code example.
module without_lookahead (clk,a,b,q,d);
input clk,d;
input [15:0] a,b;
output q;
reg q;
always @(posedge clk) begin
if (^a && (a == b)) // a and b are 16 bit registered
input busses
q <= d;
end
endmodule
A modification to the above example, to split the combinatorial logic
between two register stages, is shown in the code below.
22
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
module with_lookahead (clk,a,b,q,d);
input clk,d;
input [15:0] a,b;
output q;
reg q;
reg ce0, ce1;
always @(posedge clk) begin
ce0 <= (a[7:0]==b[7:0]);
ce1 <= (a[15:8]==b[15:8]);
end
always @(posedge clk) begin
if (^a && (ce0 == 1’b1) && (ce1==1’b1))
q<=d;
end
endmodule
Using Logic Duplication
You can improve FPGA performance by minimizing the routing delays.
High fan-out is one of the sources of routing delays that can be minimized
by logic duplication. The Quartus II software supports logic duplication
as part of the netlist optimization. The following RTL code example
shows a technique to reduce the high fan-out.
f
For more information, see “Netlist Optimizations” on page 45.
module large_load(in,en,clk,out);
input [31:0] in;
input clk,en;
output [31:0] out;
reg [31:0] out;
always @(posedge clk) begin
if (en)
out <= in;
else
out <=32’bz;
end
endmodule
As shown above, signal en drives 32 tri-state buffers and contributes to
the routing delay. As shown in the modified RTL code below, the signal
en can be duplicated to drive a set of 16 tri-state buffers instead of 32.
However, synthesis tools see this replication as redundant, and tend to
optimize the replicated logic. Synthesis attributes are needed to keep the
intended replicated logic, as shown in the following code for both
LeonardoSpectrum™ and Synplify.
Altera Corporation
23
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
module logic_duplicate(in,en,clk,out);
input [31:0] in;
input clk,en;
output [31:0] out;
reg [31:0] out;
reg en1 /* synthesis syn_preserve=1 */; // Synplify
reg en2 /* synthesis syn_preserve=1 */; // Synplify
//exemplar attribute en2 PRESERVE_DRIVER -value TRUE
// LS
//exemplar attribute en1 PRESERVE_DRIVER -value TRUE
// LS
always @(posedge clk) begin
en1 <= en;
en2 <= en;
end
always @(posedge clk) begin
if (en1)
out[31:16] <= in[31:16];
else
out[31:16] <=16’bz;
end
always @(posedge clk) begin
if (en2)
out[15:0] <= in[15:0];
else
out[15:0] <=16’bz;
end
endmodule
Using Internal Buses
Internal buses in ASIC devices allow various internal modules and
external devices to communicate. Within an ASIC device there can
potentially be many internal tri-state buses.
When creating a design for an FPGA, these internal tri-state buses get
converted into deep-wide multiplexers. However, FPGAs will support
tri-state buses through the I/O interface to communicate between various
on-board devices.
Figure 16 compares the implementation of deep-wide multiplexers and
internal buses in FPGAs and ASICs, respectively.
24
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 16. Internal Buses in ASIC & FPGA Devices
Bus Arbiter
ASICs
FPGAs
TRI
Module A
Module B
Module C
Module A
TRI
TRI
Module B
Module C
The following RTL code converts an internal bus into multiplexers, and
the tri-state gate is converted to an OR gate.
module bustest (clk, ina, inb, inc, sel, outputvalue);
input ina, inb, inc,clk;
input [2:0] sel;
inout outputvalue;
//Internal Signals
reg reg_ina, reg_inb, reg_inc;
reg bus;
reg reg_bus;
always @(posedge clk)
begin
reg_ina <= ina;
reg_inb <= inb;
reg_inc <= inc;
reg_bus <= bus;
end
Altera Corporation
25
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
always @(sel or reg_ina or reg_inb or reg_inc)
begin
case(sel)
3’b100 : begin
bus <= reg_ina;
end
3’b010 : begin
bus <= reg_inb;
end
3’b001 : begin
bus <= reg_inc;
end
default begin
bus <= 1’bZ;
end
endcase
end
assign outputvalue = reg_bus;
endmodule
Specification of External & Internal Memory
Memory block sizes and configurations are major considerations of SOPC
designs, and performance goals determine whether to have device
memory on- or off-chip. To satisfy the demand for large memory
requirements, Altera FPGAs have a diverse set of memory functions that
you can configure for different requirements using the
Quartus II software. The Stratix and Stratix GX device families feature
TriMatrix™ memory, which can be configured for different memory sizes,
as shown in Table 1.
f
26
Preliminary
To learn more about the available memory configurations and features of
Stratix and Stratix GX devices, see the Stratix GX FPGA Family Data Sheet
and the Stratix Device Family Data Sheet in the Stratix Device Handbook.
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Table 1. Summary of TriMatrix Memory Features
Feature
M512 Block
M4K Block
M-RAM Block
Performance
312 MHz
312 MHz
300 MHz
Total RAM bits (including parity bits)
576
4,608
589,824
Configurations
512 × 1
256 × 2
128 × 4
64 × 8
64 × 9
32 × 16
32 × 18
4K × 1
2K × 2
1K × 4
512 × 8
512 × 9
256 × 16
256 × 18
128 × 32
128 × 36
64K × 8
64K × 9
32K × 16
32K × 18
16K × 32
16K × 36
8K × 64
8K × 72
4K × 128
4K × 144
v
Parity bits
Byte enable
v
v
v
v
Single-port memory
v
v
v
Simple dual-port memory
v
v
v
v
v
Embedded shift register
v
v
ROM
v
v
FIFO buffer
v
v
v
Simple dual-port mixed width support
v
v
v
v
v
True dual-port memory
True dual-port mixed width support
Memory initialization (.mif)
Mixed-clock mode
Power-up condition
v
v
Outputs cleared
v
v
Outputs cleared
v
Outputs unknown
Memory Implementation — Flexibility & Efficiency
If a memory configuration does not fit in a single memory block, the
Quartus II software implements the required configuration by combining
two or more memory blocks. For example, if the design requires a
512 × 18-bit memory block, then the Quartus II software combines two
512 × 9-bit blocks. Similarly, the Quartus II software can implement a
32K × 36-bit RAM function by combining two 32K × 18-bit M-RAM
blocks, as shown in Figure 17.
Altera Corporation
27
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Figure 17. Configuring Memory Blocks
M-RAM Block
M4K Block
32k × 36
512 × 18
32k × 18
512 × 9
M-RAM Block
M4K Block
512 × 9
32k × 18
f
.
For more information on memory configurations and the Quartus II
software settings, see the Altera web site and AN 207: TriMatrix Memory
Selection Using the Quartus II Software.
Instantiating Altera RAM in Place of an ASIC RAM
Memory instantiation in an ASIC design flow consists of instantiating
memory models provided by the foundry. For example, Figure 18 shows
a simple dual-port memory interface.
Figure 18. Dual-Port RAM
DATA_IN
WR_ADDR
WRITE
CLK
DATA_OUT
DUAL-PORT RAM
512 × 8
RD_ADDR
READ
The following RTL code shows the instantiation of the dual port memory
interface shown in Figure 18.
module my_design (port declarations)
input and output declarations
module instantiations
dual_port_ram dual_port_ram_inst
.clk (clk),
28
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
.data_in (data_in),
.rd_address (rd_address),
.read (read),
.data_out (data_out),
.wr_address (wr_address),
.write (write));
The memory model provided by the ASIC foundry contains timing
information for synthesis and a behavioral model for simulation. If the
design has to be ported to an Altera FPGA, you must configure the
memory requirement from within the Altera MegaWizard Plug-In
Manager. The wizard then generates the following set of files.
■
■
■
memory.v/memory.vhd
This file is a wrapper file that instantiates the Altera library-ofparameterized-modules (LPM) memory module, used for
simulation.
memory_bb.v/memory_bb.vhd
This file is a module declaration used for synthesis.
memory_inst.v/memory_inst.vhd
This file is a memory module instantiation example.
Timing information for the memory models, also known as “clear box,”
is available in the Quartus II software version 3.0, and can be used with
Synplify Pro 7.3 for synthesis. Using these models, Synplify Pro can
perform an accurate area and timing estimate.
Simulation
Altera provides simulation models of the memory modules generated
using the MegaWizard Plug-In Manager for all the device families. Refer
to the application notes describing the simulation flow for your choice of
a simulator tool.
Synthesis
Refer to the application notes describing the synthesis flow for your
choice of synthesis tool.
f
The Quartus II software Help system provides additional information on
Guidelines for Altera Megafunctions & LPM Functions.
Place-and-Route
Similar to ASIC design flow, the Quartus II software place-and-route tool
instantiates the appropriate memory module and performs
place-and-route of the design.
f
Altera Corporation
The Quartus II software Help system provides additional information on
the instantiation of LPM Megafunctions.
29
Preliminary
ASIC to FPGA Design Methodology & Guidelines
1
Design Development
When designing with Altera FPGAs, you must identify the
device during the design planning process, because the
available memory resource varies according to the family and
device.
The Altera Stratix and Stratix GX device families only support
synchronous memory interface. If the design already contains
asynchronous references to memory blocks, see AN 210: Converting
Memory from Asynchronous to Synchronous for Stratix & Stratix GX Designs.
Design Development Tools
Altera provides you with the following development tools, all of which
are available within the Quartus II software.
■
■
■
SOPC Builder
DSP Builder
MegaWizard Plug-In Manager
You can use these tools to ease SOPC design development, which will
reduce time-to-market. The following sections present an overview of
these tools, and their advantages.
SOPC Builder
SOPC Builder provides a standardized, graphical environment for
creating SOPC designs composed of components including CPUs,
memory interfaces, standard peripherals, and user-defined peripherals.
SOPC Builder enables the combination of components such as embedded
processors, standard peripherals, IP cores, on-chip memory, interfaces to
off-chip memory, and user-defined logic into a custom system module.
This tool generates a single system module that instantiates these
components, and automatically generates the necessary bus logic to
connect them. SOPC Builder also automatically generates Verilog Design
Files (.v) and/or VHDL Design Files (.vhd) describing the system, which
can be used as a simulation model.
Figure 19 shows an example of a typical set of system components with
which SOPC Builder can be used to generate a system-level module in
minutes. SOPC Builder will also generate the test bench.
30
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 19. Elements of a Typical SOPC Design
Nios
Processor
Memory
Controller
Memory
DMA
Controller
UART
PCI
Figure 20 shows the SOPC Builder user interface after a set of
peripherals are connected to a Nios® processor.
f
Altera Corporation
For more details about the SOPC Builder, refer to documentation at
www.altera.com/literature.
31
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Figure 20. Altera’s SOPC Builder
DSP Builder
Altera’s DSP Builder integrates the Quartus II software with high-level
algorithmic development tools such as MATLAB and Simulink software.
The DSP Builder software is a set of building blocks that shortens DSP
design cycles by helping designers create the hardware representation of
a DSP design in an algorithm-friendly development environment.
Figure 21 shows a simple design, built from concept-to-implementation,
using the DSP Builder. Several ready-to-use mathematical functions are
available within both the MATLAB and Simulink tools, along with
simulation models, which can be used to create a schematic. DSP Builder
can generate a VHDL description of the design, along with the test bench.
DSP Builder also has links to third-party EDA tools, which can be used
along with the Quartus II software to perform synthesis, simulation, and
place-and-route.
32
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 21. DSP Builder Design Flow
Drag and drop
Generate VHDL,
Synthesize, and
place-and-route
The MegaWizard Plug-In Manager
The MegaWizard Plug-In Manager, available in the Quartus II software,
helps you create or modify design files that contain custom
megafunction variations, which can then be instantiated in a design file.
These custom megafunction variations are based on Altera
megafunctions, including LPM functions, ranging from simple Boolean
gates to complex memory structures.
Figure 22 shows the selections required in the MegaWizard Plug-In
Manager to access and edit Altera’s LPM elements.
Altera Corporation
33
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Figure 22. Configuring a MegaWizard Plug-in Manager LPM Component
Desired HDL type
Family selection
List of available
megafunctions
Configurable parameters
IP Availability & Flow
IP blocks that are pre-verified reduce the design time, solve many timeto-market issues, and simplify verification.
The MegaWizard portal extension to the MegaWizard Plug-In Manager
allows designers to directly install and use the MegaCore® function
available at the IP MegaStore™ site without leaving the Quartus II
environment. Designers can select an available MegaCore function in the
list, obtain information about the selected MegaCore function, register
with the Altera IP MegaStore site, download the MegaCore function,
install and launch the corresponding MegaWizard plug-in (if provided),
all from within the Quartus II software. This feature provides access to
the most up-to-date versions of the MegaCore functions at the IP
MegaStore.
34
Preliminary
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 23 shows the initial screen of the MegaWizard Plug-In Manager
and a sample list of some of the IP cores available.
Figure 23. IP Creation
List of available
megafunctions
Options to download
the IP
IP Instantiation
If the IP blocks available through the Altera MegaWizard Plug-In
Manager are in an encrypted format and cannot be read by the thirdparty synthesis tools, instantiation of the IP will require black boxing the
IP within synthesis tools, and then reading the encrypted IP into the
Quartus II software to perform the place-and-route. The concept of black
box is shown in Figure 24.
f
Altera Corporation
For more information about the IP design flow, refer to the
documentation at www.altera.com/literature/lit-ip.html.
35
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
Figure 24. IP Instantiation
MegaWizard IP Block - FIR Filter
Top.v
Counter
Filter
Black
Box
Adder
xin
Z-1
C-0
Z-1
X
C-1
Z-1
X
C-2
Z-1
X
C-3
Tapped
Delay
Line
X
Multipliers
by
Coefficient
Adder
Tree
yout
Because LPMs are not synthesized by synthesis tools, ‘black-boxing’ is required.
Functional Simulation
Functional simulation verifies the functionality of the RTL design. The
following is the list of supported third-party EDA tools.
■
■
■
NC-SIM and Verilog-XL, from Cadence
VCS and VSS, from Synopsys
ModelSim®, from Mentor Graphics®
Simulation can also be performed using the Quartus II software native
simulator.
Synthesis
Synthesizing is the process of converting a design representation from
RTL code to gate level. Figure 25 shows a typical synthesis flow.
Compared to ASIC tools, FPGA synthesis tools are much easier to use in
terms of complexity and scripting. Industry-standard FPGA synthesis
tools support a limited set of constraints compared to their ASIC
counterparts, but the FPGA tools support all of the popular ASIC
synthesis techniques, including the following:
■
■
■
f
36
Preliminary
Top-down or bottom-up approach
Modular design flow
Scripting
For more information on design methodologies, see AN 226: Synplify &
Quartus II Design Methodology.
Altera Corporation
Design Development
ASIC to FPGA Design Methodology & Guidelines
Figure 25. Typical Synthesis Design Flow
VHDL
Verilog HDL
Functional
Simulation
Tech.lib
Constraints
Synplify Pro,
Precision Synthesis,
or Quartus II
Gate-level
Simulation
Synthesis
Post-synthesis
simulation files
Technologyspecific netlist
Forward-annotated
timing constraints
Quartus II
software
Post Placeand-Route
Simulation
Placeand-Route
Post place-and-route
simulation files
No
Timing
requirements
met?
Yes
Configuration files
( .sof / .pof )
Configure device
Scripting
FPGA synthesis tools such as Synplify and Synplify Pro from Synplicity,
and Precision Synthesis from Mentor Graphics, support scripts written in
tool command language (Tcl) and Synopsys DC (SDC) format. The
following sample scripts are typically used to perform basic synthesis
steps, as applied to Synplify/Synplify Pro.
Altera Corporation
37
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Design Development
project -new C:/test/fpga_risc8/risc.prj // opens a
new project risc.prj
add_file C:/test/fpga_risc8/src/risc8.v // adds the
source file to the project
set_option -technology APEX20KE // sets target
technology
set_option -frequency 50.000000 // sets the speed
set_option -num_critical_paths 1 // sets number of
critical paths to be reported
set_option -num_startend_points 1 // sets number of
start points for the timing report
set_option -pipe 1 // sets the option of pipelining to
TRUE
set_option -retiming 1 // sets the option of retiming
to TRUE
add_file -constraint test.sdc // reads the Synopsys DC
style commands from test.sdc
project -run synthesis // perform synthesis
project -save C:/test/fpga_risc8/risc.prj // Save the
project
SDC Scripts
The following is a sample of SDC scripts for setting constraints.
define_clock -name {clk} -freq 70.000 -clockgroup
default_clkgroup //Clock creation
define_output_delay {dds_out[7:0]} 3.00 -ref clk:r
//set output delay with respect to clk //rising
define_input_delay {expaddr[6:0]} 2.00 -ref clk:f
//set input delay with respect to clk
//falling
define_multicycle_path -from {i:ctl[0]} -to
{i:accum[9:0]} 2 //multi cycle path setting
define_false_path -from {i:ddsstep[7:0]} -to
{i:sinout[7:0]} //false path setting
define_attribute {expread} syn_maxfan {4}
// Max fanout attribute setting
Command Line Execution
The following is a sample of an SDC script for command line execution.
synplify_pro –batch <file_name>.tcl
f
38
Preliminary
For more information on scripting and supported SDC commands, see
the Synplify, or LeonardoSpectrum user and reference manual.
Altera Corporation
Test Synthesis
ASIC to FPGA Design Methodology & Guidelines
Third-Party EDA Tool Support for Synthesis
You can use the following third-party tools to perform synthesis:
■
■
■
Synplify Pro and Synplify, from Synplicity
LeonardoSpectrum and Precision Synthesis, from Mentor Graphics
FPGA Compiler II, from Synopsys
In addition to the above tools, you can also use the Quartus II software to
perform synthesis.
Test Synthesis
Vendors test FPGA devices for manufacturing defects, eliminating the
need for memory BIST, SCAN insertion, or any other tests typically used
to detect manufacturing faults. This completely eliminates the complex
task of test synthesis.
Gate-Level
Simulation &
Timing Analysis
Industry standard EDA tools for simulation can perform gate-level
simulation, and a simulation library for all of the Altera FPGA device
families is shipped with the Quartus II software.
f
For additional detailed information on performing simulation, see the
following application notes, available at www.altera.com.
■
■
■
AN 239: Using VCS with the Quartus II Software
AN 204: Using ModelSim-Altera in a Quartus II Design Flow
AN:197: Using NC-SIM with the Quartus II Software
Altera FPGA design flow also supports PrimeTime from Synopsys to
perform static timing analysis, in addition to the timing analysis
supported by the Quartus II software. The PrimeTime library for most of
the FPGA families is also included with the Quartus II software.
Place-and-Route
FPGA design place-and-route is performed using the Quartus II software
from Altera. The Quartus II software reads standard EDIF, VHDL, and
Verilog HDL netlist files, and generates VHDL and Verilog HDL netlist
files, for a convenient interface to other industry-standard EDA tools.
The Quartus II software can perform different types of netlist
optimizations to meet performance and area requirements. In addition to
netlist optimization, the Quartus II software also supports the
LogicLock™ block-based design methodology, and has advanced features
to enable fast input/fast output registers to meet the tSU and tCO
requirements.
Altera Corporation
39
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Post Place-and-Route Verification
The Timing Closure floorplan within the Quartus II software allows
designers to assign logic a row or a column within a device or logic cell
(including embedded and I/O cells). This floorplan also allows designers
to view logic placement made by the Fitter and/or by user assignments,
make-and-view LogicLock region assignments, and view critical path
information, physical timing estimates, and routing congestion.
Figure 26 compares FPGA and ASIC place-and-route processes, which
shows that performing the back-end tasks for FPGAs has many
advantages compared to ASIC. These advantages include the overall
simplicity of the process, minimal EDA tools cost, and a reduction of
time-to-market.
Figure 26. Comparison of FPGA & ASIC Place-and-Route Process
FPGA
Quartus II
Software
Place-and-Route,
Re-Synthesis, I/O
Placement, Timing
Analysis
ASIC
Placement and
Physical
Optimization
Formal Verification,
Post-layout Timing
Analysis
Clock tree
synthesis
Skew and
Timing Analysis
Routing
DRC/ERC,
Manual Layout Fixes,
Hold Time Fixes
Sign off
Signal Integrity
Sign off
Post Place-andRoute
Verification
IR drops, Xtalk
.
Verification of the place-and-route netlist is performed to verify timing
requirements, logic equivalence, and to estimate power consumption.
The Quartus II software produces netlists that support various thirdparty EDA tools to perform different verification operations.
Gate Level Simulation
After you select your simulation tool, the Quartus II software outputs a
netlist with the timing information, automatically ensuring the netlist is
compatible with the chosen tool. The Quartus II software has the option
of generating either of the following:
■
■
40
Preliminary
The Verilog HDL or VHDL netlist to perform functional simulation
The Verilog HDL or VHDL netlist, along with timing information
(the SDF file) to perform timing simulation
Altera Corporation
Post Place-and-Route Verification
ASIC to FPGA Design Methodology & Guidelines
Static Timing Analysis
If you select the PrimeTime EDA tool for timing analysis, the Quartus II
software will output the Verilog HDL or VHDL gate-level netlist, the SDF
file, and the Tcl script to run static timing analysis within PrimeTime.
Additionally, PrimeTime libraries are shipped with the Quartus II
software.
Formal Verification
The Quartus II software supports formal verification, which uses
exhaustive mathematical techniques to verify the design functionality.
Within the Altera FPGA design flow, formal verification can be
performed between the synthesized and the Quartus II post-fit netlists.
Presently, formal verification is only supported for designs synthesized
with the Synplify Pro tool.
f
For additional information about formal verification support, see
AN 296: Using Verplex Conformal LEC for Formal Verification of Design
Functionality.
Power Estimation
Altera FPGA design flow supports two methods of power estimation.
The first method uses a design utility called the online power calculator.
Estimated power consumption can be calculated for various devices
based on typical conditions. This online calculator is found in the design
utility section of each device family.
f
For more information on this calculator, go to www.altera.com/products/
devices/dev-index.jsp.
The Quartus II software also supports the Model Technology™
ModelSim® software to perform a simulation that includes power
estimation data. The designer can specify that the Quartus II software
should include power estimation data in the Verilog Output File (.vo) or
VHDL Output File (.vho) generated for APEX™ 20KE, APEX 20KC, and
Mercury™ devices. After simulating the design in the ModelSim software,
the ModelSim software uses the power estimation data to generate a
Power Input File (.pwf) that can be used within the Quartus II software
for estimating the design’s power consumption.
f
Altera Corporation
Additional information on power estimation is available in the
Quartus II software Help .
41
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Post Place-and-Route Verification
Clock Tree Synthesis
Clock tree synthesis (CTS) is an important step in the ASIC design
methodology, and is performed after placement. CTS builds a clock
network to reduce the clock skew between registers in the design. CTS
synthesis is either performed by third-party EDA tools that have the
capacity to perform physical synthesis, or by the foundry tools. CTS can
take a great deal of time, and also may require several iterations before
the required clock skew is achieved.
Altera FPGAs have dedicated clock networks that are either global or
regional, along with the PLL resources. These clock networks have high
drive strength and minimal skew, allowing the FPGA design flow to be
free from CTS, resulting in reduced design cycle time. The number of
global clock resources available depends upon the family and device, and
you must know how many global clock resources are required, and select
the device accordingly. Figure 27 shows the Stratix device’s global and
regional clock resources.
f
To learn more about PLLs and global clock networks, see the appropriate
device data sheet.
Figure 27. Stratix Clock Resources
CLK[15..12]
CLK[15..12]
CLK[11..8]
CLK[3..0]
CLK[3..0]
CLK[7..4]
1
Unlike ASIC devices, which can have an almost unlimited
number of global clock pins and location choices, FPGA devices
have a fixed number of these pins and location choices.
Ensure that the number of clock resources required for the design
implementation is equal to, or less than, the global clock and fast clock
resources available within an FPGA device. Clock pins are also fixed in
location, and design placement should be done to take advantage of the
physical location of the clock pin.
42
Preliminary
Altera Corporation
Quartus II Software Features
Quartus II
Software
Features
ASIC to FPGA Design Methodology & Guidelines
The Quartus II software is a fully integrated, architecture-independent
package for designing logic with Altera programmable logic devices
(PLDs). The Quartus II software offers a full spectrum of logic design
capabilities, including the following:
■
■
■
■
■
■
■
■
■
■
■
LogicLock incremental design
Tcl scripts
SignalTap® II Logic Analyzer
SignalProbe™ debugging technology
Design entry using schematics, block diagrams, AHDL, VHDL, and
Verilog HDL
Floorplan editing
Powerful logic synthesis including retiming
Functional and timing simulation
Timing analysis
Software source file importing, creation, and linking to produce
programming files
Device programming and verification
LogicLock
Quartus II software enables a block-based approach using the
LogicLock feature. Using this technique, you can independently create
and implement each logic module, and integrate all optimized modules
into the top-level design. Figure 28 compares traditional design flow
with LogicLock design flow.
Figure 28. Comparison Between Traditional & LogicLock Design Flow.
Traditional Design Flow
LogicLock Design Flow
Design
Design, Optimize,
and Verify
Integrate
Verify
Optimize
Integrate
Verify
Altera Corporation
43
Preliminary
ASIC to FPGA Design Methodology & Guidelines
f
Quartus II Software Features
For additional information on LogicLock methodology and usage, see
the following application notes, and the Quartus II Help.
■
■
■
AN 161: Using the LogicLock Methodology in the Quartus II Design
Software
AN 164: LeonardoSpectrum™ and Quartus II LogicLock Design Flow
AN 165: Synplify and Quartus II LogicLock Design Flow
Register Packing
Register packing combines logic cells that only use the register and logic
cells that only use the LUT into a single logic cell to reduce the area, as
shown in Figure 29. The following four options are available for register
packing.
■
■
■
■
Off: Does not pack registers.
Normal: Default setting packs registers when this is not expected to
hurt timing.
Minimize Area: The Quartus II software will aggressively pack
registers to reduce area. The software will not pack registers that are
part of a carry-and-cascade chain.
Minimize Area with Chains: The Quartus II software will
aggressively pack registers, including those that are a component of
carry-and-cascade chains, to reduce area. This option is only
available for Stratix, Stratix GX, and Cyclone devices.
Figure 29. Register Packing
LUT1
REG1
f
44
Preliminary
+
LUT2
REG2
=
LUT
REG
For additional information about register packing, see
AN 297: Optimizing Performance Using the Quartus II Software and the
Quartus II Help.
Altera Corporation
Quartus II Software Features
ASIC to FPGA Design Methodology & Guidelines
Netlist Optimizations
The Quartus II software includes netlist optimization options to further
optimize the design after synthesis and before place-and-route. The
option can be applied regardless of the synthesis tool used. Depending on
the design, some options may have more of an effect than others. The
following three options are available.
■
■
■
WYSIWYG primitive re-synthesis
Gate-level register re-timing
Logic element duplication
Figure 30 shows each of these options.
Altera Corporation
45
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Quartus II Software Features
Figure 30. Netlist Optimization Methods
WYSIWYG Primitive Re-synthesis
Un-Map
LE
ATOM
Netlist
LE
LE
Re-Map
Place
&
Route
LE
LE
Gate-level Register Re-timing
D
Q
10 ns
D
Q
5 ns
D
Q
D
Q
7 ns
D
Q
8 ns
D
Q
Logic Element Duplication
LE
LE
LE
LE
LE
LE
LE
LE
LE
LE
LE
46
Preliminary
LE
Altera Corporation
Quartus II Software Features
ASIC to FPGA Design Methodology & Guidelines
Tcl Scripting
The QuartusII software supports Tcl scripting along with Application
Program Interface (API) functions that can be used as Tcl commands. You
can generate a Tcl Script File (.tcl) from an existing project created
through the Quartus II software, or you can use Tcl language templates
and Quartus II Tcl templates to create Tcl scripts.
You can run Tcl scripts or individual Tcl commands from within the
Quartus II software and other EDA software tools such as Synplify from
Synplicity, and LeonardoSpectrum from Mentor Graphics.
The following Tcl code presents a sample Tcl script to perform basic
place-and-route.
# Load Quartus II Tcl Project package
package require ::quartus::project
}
# Create and open the project
if {[project_exists test]} {
project_open -cmp clock_sync test
} else {
project_new test
}
# Project Assignments
set_global_assignment -name "VQM_FILE"
"..\\synplify\\clock_sync.vqm"
set_global_assignment -name "COMPILER_SETTINGS"
"clock_sync"
# Assign the Synthesis, Simulation and Timing
Verification tools
set_global_assignment -name "EDA_SIMULATION_TOOL" section_id "test" "ModelSim (Verilog HDL output from
Quartus II)"
set_global_assignment -name
"EDA_TIMING_ANALYSIS_TOOL" -section_id "test"
"PrimeTime (Verilog HDL output from Quartus II)"
# Input/Output format and reading
set_global_assignment -name "EDA_INPUT_DATA_FORMAT" section_id "eda_design_synthesis" "EDIF"
set_global_assignment -name "EDA_OUTPUT_DATA_FORMAT" section_id "eda_design_synthesis" "EDIF"
# Device & Family selection
Altera Corporation
47
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Quartus II Software Features
set_global_assignment -name "FAMILY" "Stratix"
set_global_assignment -name "DEVICE" "EP1S40F1508C5"
# Timing Specifications
set_global_assignment
MHz"
set_global_assignment
set_global_assignment
set_global_assignment
-name "FMAX_REQUIREMENT" "40.0
-name "TCO_REQUIREMENT" "25ns"
-name "TPD_REQUIREMENT" "25ns"
-name "TSU_REQUIREMENT" "25ns"
# Compile the project and
# exit using "qexit" if there is an error
if [catch {qexec "quartus_fit $project_name"} result]
{
qexit -error
}
if [catch {qexec "quartus_tan $project_name"} result]
{
qexit -error
}
#------ Report Fmax from report ------#
set actual_fmax [get_fmax_from_report]
puts ""
puts "----------------------------------------------"
puts "Required Fmax: $required_fmax Actual Fmax:
$actual_fmax"
puts "----------------------------------------------"
#------ Close Project ------#
project_close
f
For more information on Tcl scripting, see AN 195: Scripting with Tcl in the
Quartus II Software and also the Quartus II software Help.
Modular Quartus II Software
With the Quartus II software version 3.0, Altera provides the flexibility of
modular executables. Modular executables provide separate programs
for each step in an FPGA design flow, from synthesis through simulation
and programming. Some benefits of modular executables include
reduced memory requirements and improved performance, commandline control over each step of the design flow, easy integration with
scripted design flows including Makefiles, and complete compatibility
with the easy-to-use Quartus II graphical user interface (GUI) based
design flow.
48
Preliminary
Altera Corporation
Quartus II Software Features
f
ASIC to FPGA Design Methodology & Guidelines
To learn more about how to use modular executables, see
AN 309: Command-line Scripting in the Quartus II Software.
In-System Verification
In ASIC devices, test points and pins are used to probe various nodes to
help identify the source of a problem. With these test points/pins and a
logic analyzer, you can effectively determine the source of most
problems.
This test method is also used in FPGAs. However, with the reconfigurability of FPGAs you have many more advantages. Altera
FPGAs support the SignalProbe™ utility, which uses an automated
process similar to the points-and-pins method to route internal nodes to
unused I/O pins. A logic analyzer can also be used with these
SignalProbe pins, similar to the ASIC methodolog. However, since the
FPGA is re-configurable, you can embed a SignalTap II logic analyzer
into the FPGA itself.
SignalTap II Embedded Logic Analyzer
The SignalTap II embedded logic analyzer captures the data of internal
nodes and transfers the data real time, via a download cable, to the
Quartus II software to be viewed in a waveform. This embedded logic
analyzer supports trigger positions, and trigger events, in the same way
that a standard external logic analyzer would.
The SignalTap II logic analyzer allows you to modify the nodes they
choose to tap on the fly. This tool also supports various clock domains in
your design by instantiating multiple SignalTap II instances into a single
device.
Because the SignalTap II logic analyzer uses the programming interface
to communicate back to the computer, you do not need extra test pins.
Acquired data is saved to the device’s internal RAM and then streamed
off-device via the JTAG communication port. This is advantageous when
BGA packages are used and access to available pins is difficult, or
impossible.
f
Altera Corporation
To learn about the SignalTap II features, see AN 280: Design Verification
Using the SignalTap II Embedded Logic Analyzer.
49
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Device Programming (Prototype)
SignalProbe Features
The SignalProbe feature supports incremental routing, allowing
designers to quickly route signals to I/O pins for efficient signal
verification without affecting the design. When device memory is limited
and there is no access to a JTAG communication port, the SignalProbe
feature can perform signal debugging and hardware verification.
With the Quartus II software, you can change the selection of nodes that
you want routed to pre-specified SignalProbe pins. With these routed
SignalProbe pins, you can then use a logic analyzer to analyze your
selected internal nodes.
The SignalProbe feature allows you to re-route your signals in as little as
5% of the time required for a full re-compilation. The SignalProbe feature
gives quick access to internal design signals to Altera designers for
system-level debugging.
f
For more information on the SignalProbe feature, refer to the Quartus II
Help.
Boundary Scan Description Language (BSDL) Testing
Both ASIC and FPGA devices use BSDL Testing to ensure pin
connectivity and functionality. Advances in surface-mount packaging
and PC board manufacturing have resulted in smaller boards, making
traditional test methods, such as external test probes, harder to
implement. BSDL testing can test pin connections without using physical
test probes and capture functional data while a device is operating
normally.
All Altera devices support BSDL testing. In Altera FPGA devices, a JTAG
(TAP) controller is already embedded into the device and does not
require any engineering work, unlike ASIC devices. Altera also provides
a BSDL file for each device in each family. These BSDL files can be found
on the Altera web site.
f
Device
Programming
(Prototype)
50
Preliminary
For more information about BSDL testing, see AN 39: IEEE 1149.1 (JTAG)
Boundary-Scan Testing in Altera Devices.
One difference between ASIC and FPGA devices is that, unlike an FPGA
device, functionality of an ASIC device cannot be modified. Once an
ASIC device returns from the foundry, the device functionality is fixed
and not much can be done to it. Alternatively, FPGA devices can be reconfigured many times, providing much more flexibility.
Altera Corporation
Device Programming (Prototype)
ASIC to FPGA Design Methodology & Guidelines
The ability to change the functionality of a device and add enhancements
and fixes is common in the prototyping stages. There are various ways
FPGA devices can be configured.
Altera SRAM-based FPGA devices (like Stratix, Stratix GX, and Cyclone
devices) can be configured in many ways, as shown in Table 2.
Table 2. Altera SRAM-based Configuration Schemes
Configuration Scheme
Configuration Device
Device Family
Typical USe
APEX II,
APEX 20K,
Mercury, ACEX 1K,
FLEX 10K,
Configuration with the EPC16, EPC8, EPC2, EPC1, or
EPC1441 configuration devices.
FLEX 6000
Configuration with EPC1 or EPC1441 configuration devices.
Passive Serial (PS)
APEX II,
APEX 20K,
Mercury, ACEX 1K,
FLEX 10K,
FLEX 6000
Configuration with a serial synchronous microprocessor
interface and the MasterBlasterTM communications cable or
ByteBlasterMVTM parallel port download cable. (1)
Passive Parallel
Synchronous (PPS)
APEX II,
APEX 20K,
Mercury, ACEX 1K,
FLEX 10K
Configuration with a parallel synchronous microprocessor
interface.
Fast Passive Parallel
(FPP)
APEX II
Configuration with a parallel synchronous configuration device
or microprocessor interface where 8 bits of configuration data
are loaded on every clock cycle. Eight times faster than PPS.
Passive Parallel
Asynchronous (PPA)
APEX II,
APEX 20K,
Mercury, ACEX 1K,
FLEX 10K
Configuration with a parallel asynchronous microprocessor
interface. In this scheme, the microprocessor treats the target
device as memory.
Passive Serial
Asynchronous (PSA)
FLEX 6000
Configuration with a serial asynchronous microprocessor
interface.
Joint Test Action Group
(JTAG)
APEX II,
APEX 20K,
Mercury, ACEX 1K,
FLEX 10K
Configuration through the IEEE Std. 1149.1 (JTAG) pins. (2)
Notes to Table 1:
(1)
(2)
The MasterBlaster communications cable uses a standard PC serial or universal serial bus (USB) hardware interface
to download configuration data to APEX II, APEX 20K, Mercury, ACEX 1K, FLEX 10K, and FLEX 6000 devices. It
supports operation with V CC at 5.0 V, 3.3 V, 2.5 V, or 1.8 V and is supported by the QuartusTM II software. For more
information on the MasterBlaster cable, see the MasterBlaster Serial/USB Communications Cable Data Sheet.
Although you cannot configure FLEX 6000 devices through the JTAG pins, you can perform JTAG boundary-scan
testing.
Altera Corporation
51
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Device Programming (Prototype)
In general, there are two ways of configuring your FPGA device.
■
■
Use a download cable
Use a configuration device
Using a Download Cable
A download cable is used more frequently in the prototyping stages
when using SRAM-based FPGA devices. By definition, the prototyping
stage will likely require design changes and fixes. In ASIC designs, the
device must be removed and replaced by a new ASIC device, or the board
is scrapped entirely. With FPGA devices, you can reprogram the device
while it is on the board.
The Quartus II software allows you to re-configure your device via a
download cable. There are various download cables that can be used to
connect to the serial, USB, or parallel port of your computer. You can
connect it in a JTAG configuration or in a passive serial configuration.
f
For more information, see AN 116: Configuring SRAM-Based LUT Devices.
Using a Configuration Device
SRAM-based FPGA devices must be configured each time the board
powers up, and Altera provides various sizes of configuration devices
that serve this purpose.
These EEPROM-based configuration devices store the configuration data,
and configure the intended device. One typical scenario is to configure
your FPGA device with a configuration device passive serially. When the
board is powered on, the configuration device(s) will automatically send
a serial data stream to the FPGA device and configure it.
Alternatively, you can also store the configuration data in on-board
memory and use another PLD (typically an EPROM-based device, or a
microprocessor) to act as the controller. When the board is powered on,
the controller device will send the configuration data from memory to the
targeted FPGA device.
These configuration methods are ideal for the final production boards.
f
52
Preliminary
For more information about the various configuration schemes, see
AN116: Configuring SRAM-Based LUT Devices.
Altera Corporation
Hardcopy Devices
Hardcopy
Devices
ASIC to FPGA Design Methodology & Guidelines
Altera offers a cost effective ASIC-like alternative to FPGA devices in
their HardCopy™ device family. HardCopy devices extend the flexibility
of high-density FPGAs to a cost-effective high-volume production
solution. The conversion process from an Altera FPGA to a HardCopy
device offers seamless migration of a high density system-on-aprogrammable-chip to a low-cost alternative device with minimal risk.
Using HardCopy devices, Altera’s SOPC solutions can be leveraged from
prototype to production, while reducing cost and speeding time-tomarket.
A significant benefit of HardCopy devices is that designers do not need
to be involved in the device conversion process. Unlike ASIC device
development, the HardCopy design does not require generation of test
benches, test vectors, or timing and functional simulation. The HardCopy
conversion process only requires the Quartus II software-generated
output files. Altera performs the conversion and delivers functional
prototypes in as few as eight weeks.
f
For more information on the HardCopy alternative, see the
HardCopy Device Handbook, Volume 1.
Figure 31 shows the HardCopy design flow after the Quartus II software
has generated the .sof file.
Altera Corporation
53
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Hardcopy Devices
Figure 31. HardCopy Design Flow
Start Quartus II
HardCopy Flow
HardCopy
Stratix
FPGA
Family?
HardCopy APEX 20KE /
APEX20KC
Select Stratix
hardcopy_fpga
_prototype
device
Compile to the
Stratix
hardcopy_fpga_prototype
device
Run
Performance
Estimation?
No
Yes
Run
HardCopy
Optimization
Wizard
Close
Quartus
FPGA
Project
Open
Quartus
HardCopy
Optimization
Project
HardCopy
Floorplan
Compile
(to a
HardCopy
Stratix
Device)
Estimated f
of
MAX
HardCopy Stratix
Device
54
Preliminary
Placement
Information
for Actual
HardCopy
Run
HardCopy
Files
Wizard
.qar File for
Delivery to
Altera
Altera Corporation
FPGA Financial Benefit
ASIC to FPGA Design Methodology & Guidelines
FPGA Financial
Benefit
Altera also provides a means of calculating the financial advantages of
FPGAs as compared to ASICs. The tool will take inputs parameters and
provide an estimate of total projected costs and revenues. This tool is
available on the Altera web site.
Conclusion
As can be seen by comparing ASIC design flow with FPGA design
flow, as shown in Figure 2 on page 3, ASIC design flow consists of
more steps compared to those required for FPGAs, and more EDA tools
must be used to perform the associated tasks.
ASIC Design Flow Disadvantages
This section lists some of the disadvantages of ASIC design flow
compared to FPGA design flow.
■
Initial planning and physical design expertise
ASIC design methodology consists of initial physical planning, as
follows.
●
●
●
I/O placement
Memory placement
Power and physical planning
This gives visibility and control of a device’s area and speed early in
the design phase. To perform the above tasks, design teams should
also have designers with physical design expertise.
Altera Corporation
■
Test synthesis
Scan cells and memory BIST logic adds area and test overhead to
detect manufacturing faults.
■
Foundry involvement
Back-end tasks involve the physical design planning between the
design team and the foundry. Foundry personnel perform placeand-route of the design, and verification and timing analysis is done
by the design team. There will be iterations before the final sign-off,
resulting in delayed time-to-market.
■
EDA tools
Additional number of EDA tools adds to cost and complexity
compared to FPGA design flow.
■
NRE charges
Design re-spin results in increased NRE charges. Also, the masks are
extremely expensive, costing approximately $850,000 IRUDmask
manufactured using µm WHFKQRORJ\
55
Preliminary
ASIC to FPGA Design Methodology & Guidelines
Conclusion
■
Firmware development
Testing of the firmware has to be done after the design fabrication,
and any modifications to the firmware results in delayed time-tomarket.
■
Time-to-market and high risk
Length of the ASIC design cycle is considerably longer compared to
FPGA design flow and any modifications to the design results in
high NRE charges, and delayed time-to-market.
FPGA Design Flow Advantages
This section lists the advantages of FPGA design methodology.
56
Preliminary
■
No design-for-test tools needed
FPGA devices are free from manufacturing defects, resulting in a
scan-free design and eliminating formal verification between the
original design and the scan design.
■
Early prototyping and volume production
FPGA designs can have a lead time of 10 to 12 weeks compared to
ASICs. After a device is chosen, parts can also be ordered well before
the design is complete, ensuring they are available as soon as the
designer is ready.
■
HardCopy
Altera offers the HardCopy family, an ASIC-like alternative to FPGA
devices, which is cost effective compared to ASIC.
■
Firmware development
Firmware development and testing can be done as soon as the design
is prototyped and modifications, either with the firmware or the
design, can be accomplished with a short turnaround time.
■
NRE charges and functional flexibility
Because devices can be re-configured, there is no NRE cost
associated with design re-spins, and design modifications are easy to
accommodate—taking minutes as compared to months.
■
Place-and-route
The Quartus II software from Altera performs place-and-route and
generates all the files necessary to build the FPGA, eliminating the
requirement of coordinating with an external foundry. The
Quartus II software also generates the necessary netlists that can be
used with industry-standard simulation and timing verification
EDA tools.
Altera Corporation
ASIC to FPGA Design Methodology & Guidelines
101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
www.altera.com
Applications Hotline:
(800) 800-EPLD
Literature Services:
[email protected]
58
Preliminary
Conclusion
Copyright © 2003 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,
the stylized Altera logo, specific device designations, and all other words and logos that are identified as
trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera
Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending
applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products
to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability
arising out of the application or use of any information, product, or service described
herein except as expressly agreed to in writing by Altera Corporation. Altera customers
are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.
Printed on recycled paper
Altera Corporation