QuickPlay: Software-Defined FPGA Platforms for

QuickPlay: Software-Defined FPGA Platforms
for Hardware-Augmented Applications
QuickPlay is a unique design tool that allows software and hardware designers to develop and
implement systems that include custom FPGA hardware, while doing no hardware design and
requiring little to no hardware expertise. While QuickPlay has been streamlined for use by those
without a hardware background, it can also dramatically improve a hardware engineer’s
productivity.
Many tools have attempted this goal in the past, without success, and that history means that
systems engineers may well look more critically at tools, like QuickPlay, that promise such a
capability. The purpose of this whitepaper is to explain how QuickPlay is different from what has
come before, and why it is capable of achieving this elusive, yet coveted, goal.
The discussion will start by defining the design problem that QuickPlay addresses and then
examine the challenges of solving that problem, including critical areas where past attempts have
come short of the promised goal. We then review the key characteristics that allow QuickPlay to
be successful, followed by a high-level overview of how one designs a system using QuickPlay.
The Case for FPGAs
While CPUs are common, flexible, and familiar, CPU performance has struggled to keep up with
the demands of increasingly complex algorithms and exploding volumes of data. Fueling the
Internet of Things (IoT) and Big Data clearly requires new computing paradigms.
On the other hand FPGAs provide unrivalled performance while maintaining a level of flexibility
that software developers are used to. FPGAs allow hardware implementations that can be
designed and redesigned without the expense of custom silicon.
In addition, energy consumption has become a leading consideration, whether for controlling
operating costs in a data center or for maintaining the life of a battery. FPGAs can provide
targeted functions that achieve their performance using much less energy than a CPU would
require.
Finally, FPGAs provide a broad range of hardware options that are typically not available with a
CPU. System developers using FPGAs can make more flexible decisions regarding I/O and memory,
and have much more control over raw speed, overall bandwidth, and latency. QuickPlay leverages
these resources, turning FPGAs into software-defined platforms that yield hardware benefits with
no hardware design work.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
1
The Tension between Software and Hardware
Many more systems could benefit from higher hardware content, and yet there is strong
resistance to using hardware. The reason is that hardware design requires a very different skill set
and thought process from software design.
Software designers focus on functionality: how is my data being manipulated? It is flow-oriented
and algorithmic in nature. Hardware designers, by contrast, focus on structure. Which
components are required? How should they be configured? How will they be interconnected and
synchronized? This includes signals and busses and clocks and resets, which have no software
counterparts.
The following table summarizes some of the key differences between software and hardware
design.
Programming
model
Software Design
Functional models:
Control flow, data flow
Software languages specify a
sequence of data manipulations.
Hardware Design
Structural specification
Hardware languages specify how
specific components (CPU, memory,
bus, DSP, engines, peripherals, etc.)
should be assembled.
Low
Hardware languages define the
execution engine (Boolean gates,
registers, state machines, wires,
clocks, resets)
Abstraction
High
Software languages abstract away
the underlying execution engine
Concurrency
Sequential
Massively parallel
Software naturally represents
All specified structures execute
sequential operation. There’s limited concurrently.
syntactical or semantic support for
concurrency.
Untimed
Explicitly timed
Software languages have no concept Hardware timing is explicitly defined
of time. Timing is established by the
by the hardware designer through the
execution engine.
instantiation of clocked registers.
Universal
Specialized
C, C++, Java, etc.
Verilog, VHDL, SDC, etc.
Notion of time
Languages
Design tools
Verification
Ubiquitous
Easy-to-use open-source software
tools are generally available for free
Easy
Software verification is done at the
source level
Specialized
Hardware tools tend to be
proprietary, expensive, and complex.
Extremely difficult
Hardware verification involves lowlevel analysis, typically using electrical
waveforms.
Conceptually, it’s hard to imagine software and hardware design being more different.
For this reason, hardware design is typically done by engineers with a different mindset and with
different tools from those used by software engineers. This hardware expertise is scarcely
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
2
available and prevents many companies without hardware expertise to embark down the path of
leveraging custom hardware, and companies with such expertise to expand the use of such
custom hardware.
From a practical standpoint, hardware design departments tend to be organized away from
software groups, and it’s a truism that the two operate in isolated silos with minimal
communication. That’s improving as companies become more efficient, but the fact remains that
including hardware design in a project has traditionally reflected significant extra effort, cost, and
delay.
Design visionaries have long dreamt of a means whereby a software engineer could create
hardware without transforming into a hardware designer. This requires a significant level of
abstraction, and it is explicitly the problem that QuickPlay solves. QuickPlay is not the first attempt
at developing such system design tool; it’s merely the first that has succeeded.
A History of Partial Solutions
Because software engineers deal with functions that operate on data, it’s natural that their
attempts to create hardware will have a similar focus. Specifically, the hardware they will want to
create will, in theory, be functionally isomorphic with the original software function.
Let’s imagine an algorithm that can be broken into two functions, which we’ll call Function 1 and
Function 2. From a software designer’s standpoint, this is simple and can be represented as shown
in Figure 1.
Figure 1 - Functions to be performed on data
In a software version, a program would call Function1(), followed by a separate call to
Function2(). If we want to accelerate these 2 software functions in hardware, we need to
create custom hardware from the software. If we want to automate this, then we need a tool that
can make this transformation.
Such tools exist, and they’re called High-Level Synthesis (HLS) tools. They take parts of high-level
C/C++ programs that have no hardware notions and turn them into hardware models in Hardware
Description Language (HDL). Figure 2 below shows the two stand-alone hardware “Kernels”
generated by an HLS tools from Function 1 and Function 2.
Figure 2 - Hardware kernels compiled with HLS tool
However, as the following discussion will show, HLS tools, despite being very efficient, provide
only part of the solution. As we’ll see much more is needed beyond HLS to build custom hardware
that can be used in a system.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
3
Let’s start with the data being processed. Where does it come from? Perhaps we will acquire data
from an Ethernet network and, after processing, deliver it to a host CPU through a PCI Express link.
Figure 1 above represents a complete picture of the system to the software engineer, and yet
nothing in it provides any clue of how the data is acquired or sent off. So we need to add those to
the picture. Figure 3 includes the hardware components necessary to read and write data to and
from their respective input and output ports.
Figure 3 - Data I/O added to system.
In a realistic hardware system, each of these blocks needs control in order to function coherently.
Such control is implicit in a system with a CPU, but it must be explicitly designed in a hardware
system. Figure 4 includes that circuitry.
Figure 4 - Control block required for coherent operation
A real system may also have other peripherals – UART and LEDs and buttons and keyboards, and
such. They must be accounted for in the hardware design, as shown in Figure 5.
Figure 5 - Peripherals and other such hardware will affect operation
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
4
These capture the obvious hardware elements needed to support the main functions. But
underlying these is an important notion that is typically missing from the software domain: lowlevel timing and system startup. These involve clocks and reset signals, and they drive every block
in the system. There may even be multiple clocks with different frequencies, each having its own
“domain.” Managing those domains properly is critical to ensuring system stability and data
integrity. Figure 6 shows the required infrastructure for that.
Figure 6 - Clock and reset signals are critical to all parts of the hardware.
Compare Figure 6 with Figure 2, and recall that traditional HLS tools cover only what’s in Figure 2.
If we were to compare this to automobile design, it’s as if HLS tools create engines, which is useful,
but much more is needed in order to design complete cars. As we can see here, HLS is beneficial,
but far from sufficient.
Up to here, we’ve represented an abstract design process where the designer builds everything,
including the board, from scratch. But, in practice, it’s common to make use of existing off-theshelf boards to save time. That requires further work to map the design onto the resources of the
selected board, generating the appropriate clocks and other board-related tasks. This work is
simpler than designing a new board, but it is explicitly a hardware task that can be challenging for
software engineers. It can still take many weeks for an experienced hardware designer to bring up
a design on a new board.
Finally, no real design is ever completed without the need to debug errors. There are two domains
within which these errors could occur. First, the original functions may not have been specified
perfectly, and so they may require debug and iteration to correct problems. This is entirely within
the scope of the software engineer’s skills. Second, the support hardware, if not designed
carefully, could also contain bugs, and there would be no way for a software engineer to debug
this.
Software and hardware debug are very different. In particular, hardware debug tools revolve
around probing specific wires and observing the waveforms and timing to deduce where the bugs
are. These tools and notions will not be familiar to the typical software engineer.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
5
Therefore, for a design methodology to be useful to a software engineer, any debug must involve
only software that the designer has explicitly created, and this debug process must be achievable
using standard software development methodologies and tools. No hardware debug – of the
functions or of the support hardware – should ever be required.
In summary, if a tool enables software engineers to augment their applications with custom
hardware,
it needs the following characteristics:
Be readily usable for a software engineer who has no hardware expertise
Be able to create functional hardware from pure software code
Be able to incorporate existing hardware IP blocks if they are available
Be able to infer and create all of the support hardware - interfaces, control, clocks, etc.
Be able to support the use of commercial off-the-shelf boards and custom boards
The inferred support hardware must be correct by construction so that it requires no hardware
debug
 Debug of functional blocks must be performed using standard software debug tools only,
with no hardware level debug






The ambitious dream of allowing software engineers to create hardware has remained an elusive,
yet coveted, goal.
A Software-Centric Methodology
The overall process of implementing a design using QuickPlay is straightforward. It consists of:
1.
2.
3.
4.
Developing a C/C++ functional dataflow model of the hardware engine
Verifying the functional model with standard C/C++ debug tools
Specifying the target FPGA boards and interfaces (PCIe, Ethernet, DDR, QDR, etc)
Compiling the HW engine
That’s all you need to get working hardware. However, in order for this simple process to work
seamlessly, the generated hardware engine must be guaranteed to function identically to the
original software model. Another way of stating this is that the functional model must be
deterministic so that, no matter whether executed in software or in any possible hardware
implementation, execution will give the same results, albeit at different speeds.
Unfortunately, most parallel systems suffer from non-deterministic execution. Multi-threaded
software execution, for example, depends on the CPU, on the OS and on non-related processes
running on the same host. Multiple runs of the same multi-threaded program can produce
different behaviors. Such non-determinism in hardware would be a nightmare, as it would require
debugging the hardware engine itself, at the electrical waveform level.
To eliminate this debug abstraction paradox, QuickPlay promotes an intuitive dataflow model that
mathematically guarantees deterministic execution, regardless of the execution engine. Such
model consists of concurrent functions, called “kernels”, communicating with streaming channels,
which correlates well with how you might sketch your application on a whiteboard. In order to
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
6
guarantee deterministic behavior, these kernels must communicate with each other in a way that
prevents data hazards, such as race conditions. This is achieved with streaming channels that are:



FIFO-based
Blocking read and non-blocking write
Point-to-point
Some of you may recognize these as the characteristics of a Kahn Process Network (KPN) – which
is indeed the model of computation QuickPlay is built upon.
Figure 7 - Kahn Process Network example in QuickPlay.
The contents of any kernel can be arbitrary C/C++ code. Kernels can also be defined hierarchically,
with one kernel containing a sub-network of kernels rather than code. Each kernel can then be
defined as:


A C-function compiled to hardware through an HLS engine, whether the QuickPlay HLS
engine or your FPGA vendor HLS engine,
or
An existing piece of hardware IP, defined using a hardware description language, along
with an accompanying C functional model.
QuickPlay then features a straightforward design flow, as shown in Figure 8 below.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
7
Figure 8 – QuickPlay compilation and execution Flow.
A Simple 6-step process
The following steps describe the QuickPlay design flow in more detail.
Step 1: Pure software design
This is where you create your kernels, write C to define their functional behavior and connect
them together with streams. QuickPlay Eclipse-based IDE provides a C/C++ library with simple APIs
to:


Create kernels, streams, streaming ports and memory ports
Read and write to/from streaming ports and memory ports
In addition, the QuickPlay IDE provides an intuitive graphical editor that lets you program the way
you think - visually. Figure 7 is a screenshot of a simple design built within QuickPlay.
Step 2: Functional verification
In this step, the focus is on making sure that the software model written in Step 1 works correctly.
This is done by compiling the software model on your desktop, executing it with test inputs, and
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
8
verifying the correctness of the outputs. The software model is parallel, with a distinct thread for
each kernel, but because QuickPlay KPN modeling provides deterministic execution, you don’t
have to worry about the concurrency and can focus on the true functional bugs.
Debugging your software model is done with standard software debug techniques and tools –
break points, watch points, step-by-step execution, printf, etc. You’ll probably be running more
tests once it’s in hardware, which will likely uncover more bugs, but we’ll deal with that shortly.
From a design flow standpoint, this is where you do all of your verification. You will not need any
further debugging at the hardware level.
It’s also important to remember that the functional model involves none of the infrastructure. In
the example above, the focus is on the contents of Figure 1. None of the system aspects added in
Figure 3 through Figure 6 – communication components, control plane, clocking & resets, etc. - are
in play during this modeling and verification phase.
Step 3: Hardware generation
This is where you generate hardware from your software model. To do this, you:

Select which FPGA board to target. QuickPlay can implement designs on a growing
selection of off-the-shelf boards. These boards typically feature leading edge Altera or
Xilinx FPGAs, PCIe 3.0 link, 10Gb Ethernet, application specific interfaces, DDR3/4 SDRAM,
QDR2+ SRAM, Flash memory and more.
Selection is done through a simple menu in the QuickPlay tool.
Figure 9 – Off-the-shelf FPGA board

Map your input and output ports to the board’s physical interfaces. These are done
through simple menu selections. Some of these interfaces can be:
o PCI-Express
o TCP/IP, UDP over 10Gb Ethernet
o HD SDI, HDMI, DisplayPort, SMPTE 2022
o DDR3 DDR4, QDR2+ , flash memory
o …
Some of these protocols may require some minimal user information, like a MAC and IP
address for a TCP/IP interface.
For every board supported in QuickPlay the corresponding interfaces are available for the
user to select and use within its design.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
9
Selecting the communication protocol automatically invokes not only the hardware IP
required to implement the connection, but also any software stacks layered over it so that
the complete system is instantiated.


For FIFOs and memories, select whether to use FPGA internal SRAM or on-board external
memory.
Push the “Build” button. This will compile software, run the HLS tool, create the system
hardware, and run any other tools necessary to build the hardware images that the board
will require. No manual intervention is required to complete this process.
Step 4: System execution
This is similar to the execution of the functional model in Step 2, except that now the application is
running in hardware and software on a real system. This means that you can stream real data in,
dramatically improving the verification coverage of your function. Because this will run so much
faster, and because you can use live data sources, you are now able to run many more tests at this
stage than you could during Step 2 – Functional verification - and therefore dramatically increase
your test coverage.
Step 5: System debug
Because you’re running so many more tests now, you’re likely to uncover functional bugs that
weren’t uncovered in Step 2. So now how do you debug those new bugs? As mentioned before,
you never have to debug at the hardware level, even if the bug is discovered after executing a
function in hardware. Because QuickPlay guaranties that the generated hardware is functionally
equivalent to the software model, a bug discovered at this stage actually reflects a bug in the
original algorithm. Therefore any bug in the hardware version has to exist in the software version
as well. This is why you don’t need to debug in hardware; you can debug exclusively in the
software domain.
But you do need to have a way to identify the test sequence that failed in hardware so that you
can run that identical test sequence on the software functional model. QuickPlay captures the
hardware tests as they run and can then import any test back into the software environment
where you actually do your debug.
This is possible because the hardware system is automatically provisioned with infrastructure for
observing all of the critical points of the design. Figure 10 shows the system of Figure 6 with added
debug circuitry. Without QuickPlay, some sort of debug infrastructure would have to be inserted
and managed by hand; with QuickPlay, this all becomes transparent.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
10
Figure 10 - Debug infrastructure is automatically created.
The overall process, then, as illustrated in Figure 11, is to model in software, then build the system
and test in hardware. If there are any bugs, import the failing test sequences back into the
software environment, debug there, fix the source code, and then repeat the process. This
represents a dramatic productivity improvement over traditional flows.
Figure 11 - Debug happens only in software domain
Step 6 (Optional): System optimization
Once you have completed the debug phase, you are done: your system is complete. However, you
may want to make some performance optimizations, and this is the time to do that – when you
know that your system is running correctly.
The first optimization you should consider is to refine your functional model. There are probably
additional concurrency opportunities available, for example, so you may try decomposing or
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
11
refactoring the functions a different way. At this level, optimizations can yield spectacular
performance improvements.
Second, you may want to try a different FPGA board. Because the mapping from the functional
model to the board is so easy, it’s very simple to try a variety of boards to select the optimal one.
The third optimization has to do with the hardware kernels that QuickPlay creates via HLS. While
the resulting hardware is guaranteed to operate correctly and efficiently, it may not operate as
efficiently as hardware hand-crafted by a hardware engineer. So at this stage, you have several
options:



Optimize your code and tune QuickPlay HLS settings to improve the generated hardware.
Choose a 3rd party HLS tool to generate more efficient hardware.
Have a hardware team hand-craft the most critical blocks.
None of these steps is required, but they provide options when you need better hardware but
have limited hardware design resources available. A hardware engineer may be able to help with
these optimizations. Once any of these changes is made, the build process is simply repeated.
A Universal Streaming Conduit
QuickPlay enables rapid design of hardware-augmented applications, with broad software
architecture flexibility. It is based on a data-flow model of computation where data moves through
streaming channels that can have many different physical incarnations:
Streaming Type
Kernel to Kernel
Kernel to FPGA SRAM memory
Kernel to DDR memory
Kernel to QDR memory
Kernel to embedded CPU
Kernel to external CPU
Physical Media
FPGA fabric
FPGA fabric
DDR link
QDR link
FPGA fabric
PCIe link
TCP/IP Ethernet network
QuickPlay provides a universal streaming API that entirely abstracts away the underlying physical
communication protocol. Streaming data is received via the ReadStream() function; streaming
data is sent on using the WriteStream() function. These functions can be used to send and
receive data between kernels, to embedded or board-level memory, and to embedded or external
host CPU, thus providing broad architectural flexibility with no need to comprehend or manage
the underlying low-level protocols.
The hardware through which that data arrives and departs is determined by the selected protocol.
Selecting the desired protocol sets up not only the hardware needed to implement the protocol,
but also the software stacks required to support the higher protocol layers, as shown in Figure 12
below.
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
12
Figure 12 - Universal Streaming API
The exact implementation of these reads and writes – size, alignments, marshaling, etc. – are
managed by QuickPlay. The most important characteristic of the ReadStream and
WriteStream statements is that they’re blocking: when either statement is encountered,
execution will not pass to the next statement until all of the expected data has been read or
written. This is important for realizing the determinism of the algorithm.
The “binding” between the generic ReadStream and WriteStream statements and the actual
underlying protocol hardware occurs at run time via the QuickPlay Library. Not only does this keep
the communication details from cluttering up the software program, it also provides modularity
and portability. The communication protocol can easily be changed without requiring any changes
to the actual kernel code or host software. The ReadStream and WriteStream statements
will automatically bind to whichever protocol has been selected with no effect on program
semantics.
As a result of the abstraction that QuickPlay provides, the software algorithms remain pure,
focusing solely on data manipulation in a manner that’s completely independent of the underlying
communication details.
Quick to learn; production quality; first to market
The learning curve to use QuickPlay is modest. Building KPN models may take a little study, but it
should be intuitive for most users since it is based upon the natural functional representation of
computing systems.
Depending on the HLS tool being used, results might be improved by learning coding styles that
result in more efficient hardware generation, but that is optional. Any design can be done without
code restructuring, and many designs won’t require it at all.
Whether you use an off-shelve board or a custom board, the systems you create using QuickPlay
are production-worthy. What that means is that QuickPlay is the fastest way to get from a system
idea to a hardware-augmented application. A process that would normally take months is reduced
to days, and you can be deploying and shipping to your customers faster than you imagined
possible.
All of this makes QuickPlay a unique tool that achieves the long-sought goal of allowing software
engineers and hardware engineers to implement systems based on custom FPGA hardware and be
production-ready months ahead of fully handcrafted designs. By working in their familiar domain,
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
13
software engineers can make use of custom hardware as needed, automatically generating
hardware-augmented applications; By working with a higher level of abstraction, hardware
engineers can benefit from the automatically generated optimized hardware infrastructure, while
focusing their unique expertise on a select few key components of the system.
In summary then, QuickPlay uniquely provides a methodology that:







Involves only software tools and techniques, requiring little to no hardware expertise
Is capable of creating a hardware implementation from pure, untimed software
Can integrate functional hardware IP blocks if desired
Can infer and create all necessary supporting hardware infrastructure
Supports a growing ecosystem of FPGA boards
Creates correct-by-construction infrastructure that never needs debugging
Allows functional debug purely in the software domain, with no hardware level debug required
About QuickPlay
QuickPlay is an initiative of PLDA GROUP, a privately-owned, self-funded technology group that
serves the embedded electronics industry since 1996 by providing a broad range of leading edge
products and services to over 5,000 companies worldwide. QuickPlay embodies our long time
vision that FPGA computing should be as approachable as ubiquitous CPU computing. QuickPlay is
the result of years of research in the field of High-Level Design (HLD) and High-Level Synthesis
(HLS) tightly coupled with a strong expertise in FPGA hardware and IP design. QuickPlay is proof
that innovations happen when talents from different engineering perspectives are brought
together to work on a common cause.
PLDA GROUP has R&D offices in France, Italy, Bulgaria, and USA.
More info at: www.quickplay.io
QuickPlay: Software-Defined FPGA Platforms for Hardware-Augmented Applications – November, 2015
www.quickplay.io
14