Virtex-5 FPGA XtremeDSP Design Considerations

Virtex-5 FPGA Coding
Techniques, Part 2
FPGA and ASIC Technology
Comparison - 1
© 2009 Xilinx, Inc. All Rights Reserved
Intro to VHDL or
Intro to Verilog
3
days
FPGA and ASIC Technology
Comparison
Curriculum
Path
FPGA vs. ASIC Design
Flow
ASIC to FPGA Coding
Conversion
Virtex-5 FPGA
Coding Techniques
Spartan-3 FPGA
Coding Techniques
Fundamentals of
FPGA Design
Designing for
Performance
for
1
day
ASIC Design
2
days
Advanced FPGA
Implementation
2
days
Welcome
This REL will help you build efficient
Virtex®-5 FPGA designs that have an
efficient size and run at high speed
We will show you how to avoid some of the
most common design mistakes
This content is essential if you have never
coded a design for the Virtex-5 FPGA or are
converting an ASIC design
After completing this module, you
will able to:
Optimize ASIC code for implementation in a
Virtex-5 FPGA
Build a checklist of tips for optimizing your
code for the Virtex-5 FPGA
Clock Enable
Control the use of clock enables from the code
Code them only when needed
If a low-fanout CE is necessary, use synthesis attributes to control the use
of control signals at the signal or module level
•
Do not use global switches to turn off the use of CEs
–
Results in an average of 25-percent LUT increase
Consider using alternative coding methods for low-fanout clock enables
•
This will map the CE as an input to the LUT
VHDL:
if (CE) then
Q <= A;
Verilog:
if (CE)
Q <= A;
VHDL:
Q <= ((not CE) AND A) OR (CE AND Q);
Verilog:
Q <= (~CE & A) | (CE & Q);
Tip: Code low-fanout CEs for a LUT input. This will enable the
© 2009
2007 Xilinx, Inc. All Rights Reserved
flip-flop to be part of a larger
control set
FPGA and ASIC Technology
Comparison - 5
Map Report
MAP will report on the number of control sets for a particular design
(Virtex-5 FPGA only)
Running MAP with the -detail switch will give a detailed analysis of the
number of unique control signals (can be a large report)
Low number of members within a control set are of concern (fewest flipflops per control set)
FPGA and ASIC Technology
Comparison - 6
© 2009
2007 Xilinx, Inc. All Rights Reserved
Global Clock Enable
To gate entire clock domains for power reduction, use the clock-enabled
global buffer resource BUGCE
For applications that only pause the clock on small areas of the design, use
the clock enable pin of the FPGA register
Tip: This will save general routing resources
FPGA and ASIC Technology
Comparison - 7
© 2009
2007 Xilinx, Inc. All Rights Reserved
DSP Slice
Use adder chains instead of adder trees
Adder trees tend to have varying size
•
This usually makes larger adders in the last stages, which increases logic
levels
The Virtex-5 FPGA uses adder chains which obtain peak performance and
use minimal power
•
•
Requires pipelining
Adds latency
Adder
Chain
Adder
Tree
Tip: Use adder chains
instead
© 2009
2007
Xilinx, Inc. All of
Rightsadder
Reserved trees
FPGA and ASIC Technology
Comparison - 8
Block RAM
Avoid “read before write” mode for fastest performance
This is easily inferred from your coding style of your memory or by
instantiation from the CORE Generator™ tool
Synplify and other third-party synthesis tools can insert bypass logic to
prevent a possible mismatch error between your RTL and hardware
behavior
Intended to force RAM outputs to a known value when read and write
operations occur on the same memory cell
If you know this will never happen you can prevent this logic from being
added and damaging your performance with an attribute
•
Attribute syn_ramstyle of mem : signal is “no_rw_check”;
Tip:FPGA
Infer
or instantiate the memory
that is most appropriate
and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 9
I/O Registers
IOB registers provide fixed setup and clock-to-output times
Fastest way to capture input data and clock data off the device
IOB register can make it difficult to meet internal timing
Their use can lengthen route delays to internal logic
Only use IOB registers when it is necessary to meet I/O timing
•
•
It is best to allow your synthesis tool to put registers into IOBs
based on timing constraints (if your tool supports this).
Otherwise complete the following steps…
1) Disable global I/O register usage in your synthesis tool
2) Disable the Map option to pack registers into IOBs (PAR)
3) Selectively move registers into IOB with a UCF attribute
Tip:FPGA
Only
use IOB registers when
necessary to meet I/O timing
and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 10
Design Hierarchy
Register all inputs and outputs to each hierarchical block
Or at least register the outputs
Place all I/O components at the top level
This includes I/O registers, DDR, SERDES, and delay elements
If not, place them in one block of hierarchy
Any logic that needs to be placed in a single resource (such as a single
DSP slice) should be contained in a single hierarchical block
Any logic that needs the synthesis tool to use resource sharing should be
placed in a single hierarchical block
Manually duplicate registers with high fanout at a hierarchical boundary
Tip: Following these guidelines ensures that your design is
less
likely
toTechnology
interfere with design
optimization and
FPGA
and ASIC
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 11
incremental design practices
Synthesis Options
Replicate registers with high fan-out
This allows high fan-out logic to be moved closer to destinations
This can be determined from a timing report
Manual duplication or replication constraints with the synthesis tools should
be applied
Retiming option should be used, especially if design has been pipelined
Pipelining is still encouraged, but not as essential
Tip: Duplicate high fan-out logic, pipeline as needed, and if
youFPGA
pipeline
use retiming
and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
Comparison - 12
Synthesis Options
Overconstraining during synthesis can significantly increase register use
Seen as an average increase from 1–5 percent
Do NOT over-constrain during synthesis
Global optimization can lead to mixed results
Can achieve ~10 percent flip-flop reduction
•
Gives back much of that (and sometimes more) due to control signals
FSM optimization
Turning off FSM optimization can yield a small flip-flop savings
One-hot encoding is not as useful
Do NOT use slice or LUT compression switches
In some cases, latch-thrus are used and consume registers
Tip: Do NOT over-constrain and do NOT use slice or LUT
FPGA and ASIC Technology
© 2009
2007 Xilinx, Inc. All Rights Reserved
compression
Comparison - 13
Synthesis Options Summary
To help meet your timing objectives…
Turn ON logic replication and retiming
Turn OFF resource sharing
Turn ON logic optimization (widening deep data paths)
Turn OFF FSM optimization
Do NOT over constrain during synthesis
Do NOT use slice or LUT compression switches
These synthesis options make the design larger, but save FFs
and give the PAR algorithms more flexibility to meet timing
FPGA and ASIC Technology
Comparison - 14
© 2009
2007 Xilinx, Inc. All Rights Reserved
Easiest Designs to Migrate to the Virtex-5
FPGA
Designs that can utilize the new hard IP
EMAC, DSP slice, block RAM, PowerPC® 440 processor, and PCI™
technology, for example
Low-power designs that use the dedicated IP
“Slow” designs
Designs with several LUT levels generally see greater speed due to the
LUT6 and improved routing architecture
Tip: Add as much IP to your design as you can
FPGA and ASIC Technology
Comparison - 15
© 2009
2007 Xilinx, Inc. All Rights Reserved
Toughest Designs to Migrate to the
Virtex-5 FPGA
Structural designs
Designs that have not been coded properly (as just discussed)
Designs that have NOT been resynthesized
Designs that use many old netlists and cores from previous architectures
Some types of DSP designs
Heavily pipelined designs
What is in common?
They were not optimized!
Tip: Use the coding techniques described in these recorded
modules
and
you will yield the©high
speed design you hoped
FPGA and ASIC
Technology
2007 Xilinx, Inc. All Rights Reserved
2009
Comparison - 16
Common Questions
“Why can’t I code how I want to?”
You can. As long as it is synthesizable (RTL), Xilinx can build it. This
module highlights some of the lesser known trade-offs of coding styles in
terms of area, power, and performance.
“Shouldn’t the tools be able to make my code optimal?”
Some coding styles make this more difficult
•
While FPGAs are programmable, the underlying dedicated hardware is fixed
FPGA and ASIC Technology
Comparison - 17
© 2009
2007 Xilinx, Inc. All Rights Reserved
Common Questions
“The Virtex-5 FPGA should always be a speed grade faster than the
Virtex-4 FPGA, right?”
No, this is not always true, particularly for heavily pipelined designs.
“This design easily fit in the Virtex-4 FPGA and now it can’t fit in the
Virtex-5 FPGA. What’s wrong?”
Check how many control sets your design has. If you have too many, you
may need to evaluate your use of control signals. Also, check that your
cores and use of the dedicated hardware is optimal.
“Why can’t the software just optimize my inverters across a partition?”
Remember that partitions are there to preserve hierarchy and parts of your
design. Allowing any tool to selectively remove an option is counterintuitive.
FPGA and ASIC Technology
Comparison - 18
© 2009
2007 Xilinx, Inc. All Rights Reserved
Summary
Follow our synthesis recommendations…
Turn ON logic replication and retiming
Turn OFF resource sharing
Turn ON logic optimization (widening deep data paths)
Turn OFF FSM optimization
Do NOT over constrain during synthesis
Do NOT use slice or LUT compression switches
Be careful with coding unnecessary clock enables
IOB registers can make it more difficult to meet internal timing
Follow our directions to use the IOB registers only for IO timing
Follow our guidelines to ensure that your design does not interfere with
design optimization and incremental design practices
FPGA and ASIC Technology
Comparison - 19
© 2009
2007 Xilinx, Inc. All Rights Reserved
Where Can I Learn More?
Xilinx online documents
www.support.xilinx.com
•
•
•
White papers for reference
–
WP231 – HDL Coding Practices to Accelerate Design Performance
–
WP248 – Retargeting Guidelines for Virtex-5 FPGAs
–
WP275 – Get your Priorities Right – Make your Design Up to 50%
Smaller
User guides for reference
–
UG193 - Virtex-5 FPGA XtremeDSP Design Considerations
Software Manuals (found from the web or the Help menu)
–
Constraints Guide
FPGA and ASIC Technology
Comparison - 20
© 2009
2007 Xilinx, Inc. All Rights Reserved
Trademark Information
Xilinx is disclosing this Document and Intellectual Propery (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface
with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or
transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written
consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications
regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any
rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make
changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or
to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or
assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH
YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE,
WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED,
OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES
PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE
ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU
WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as
in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk
Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the
Design in such High-Risk Applications is fully at your risk.
© 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other
trademarks are the property of their respective owners. PCI, PCIe and PCI Express are trademarks of PCI-SIG and used under license. The PowerPC name and
logo are registered trademarks of IBM Corp. and used under license. All other trademarks are the property of their respective owners.
FPGA and ASIC Technology
Comparison - 21
© 2009
2007 Xilinx, Inc. All Rights Reserved