in this wiki.

Introduction
This wiki page contains information on how to build various components of the Zynq UltraScale+ MPSoC
Software Acceleration reference design (TRD). The page also has information on how to setup the
hardware and software platforms and run the design on ZCU102 kit. The part used on ZCU102 board is
xczu9eg-ffvb1156-1-e-es1 (active).
About the TRD
The Software acceleration TRD is an embedded signal processing application designed to showcase
various features and capabilities of the Zynq UltrScale+ MPSoC ZU9EG device for the embedded domain.
The TRD consists of two elements: The Zynq UltraScale+ MPSoC Processing System (PS) and a signal
processing application (FFT) implemented in Programmable Logic (PL). The MPSoC allows the user to
implement a signal processing algorithm that performs FFT on samples (coming from TPG in PL or
SYSMON through external channel) either as a software program running on the Zynq UltraScale+ MP
SoC based PS or as a hardware accelerator inside the PL. The TRD demonstrates how the user can
seamlessly switch between a software or a hardware implementation and evaluate the cost and benefit
of each implementation. The TRD also demonstrates the value of offloading computation-intensive tasks
onto PL, thereby freeing the CPU resources to be available for user-specific applications.
For additional information, please refer to the TRD userguide here. <link to the TRD UG1211>
Download the TRD
The TRD archive can be downloaded from here. <path to TRD zip file on lounge>
TRD package content
The Software acceleration TRD package is released with the source code, Xilinx Vivado and SDK
projects, and an SD card image that enables the user to run the demonstration and software application.
It also includes the binaries necessary to configure and boot the ZCU102 board. Prior to running the
steps mentioned in this wiki page, user has to download the TRD package and extract its contents to a
directory referred to as ‘TRD home directory’ in this wiki.
Folder/file
hardware
Description
Contains hardware design files
Sources
Vivado/scripts
Software
Petalinux
Xsdk
Ready_to_test
BOOT.BIN
Contains HDL sources, constraints and local IP
repository
Contains the scripts to build the hardware
design
Contains the software source files
Contains ready to test binaries
BIN file containing FSBL, PL bitstream, U-boot
and ARM trusted firmware
Image.ub
Kernel Image
Autostart.sh
Script to launch the demo
Bin
This directory contains the Qt GUI application.
README.txt
Contains design version history, steps to
implement the design, Vivado and Petalinux
versions to be used to build the design.
THIRD_PARTY_NOTICES.zip
Licensing info
IMPORTANT_NOTICE_CONCERNING_THIRD_PARTY- Licensing info
CONTENT.txt
Pre-requisites <Software Team to review this section>

ZCU102 Evaluation Kit with Xilinx Vivado Design Suite, Device locked to xczu9eg-ffvb1156-1-ees1.


A Linux development PC with the ARM GNU tools installed.
A Linux development PC with the distributed version control system Git installed. For information,
refer to the Xilinx Git wiki.
A Linux development PC with QT and QWT libraries cross-compiled for Zynq platform. Set
ZYNQ_QT_INSTALL environment variable by referring to Xilinx Zynq Qt/Qwt Libraries - Build
Instructions
GNU make utility version 3.81 or higher.


Known Issues
No known issues so far.
Running the demo
This section provides step by step instructions on how to bring up the ZCU102 board for demonstration
part of the TRD and running different options out of the box.
The binaries required to run the design are in $TRD_HOME/ready_to_test folder. It also includes the
binaries necessary to configure and boot the ZCU102 board.
a) Copy the entire folder content from $TRD_HOME/ready_to_test onto the primary partition of the SDMMC card which is formatted as FAT32 using a SD-MMC card reader.
b) Petalinux console login details:User : root
Password : root
Hardware Setup Requirements
Requirements for TRD Linux application demo setup








The ZCU102 Evaluation kit with the part xczu9eg-ffvb1156-1
AC power adapter (12 VDC)
Optional: An USB Type-A to USB Micro-B cable (for UART communications) and a Tera Term
Pro (or similar) UART terminal program.
USB-UART drivers from Silicon Labs
A SD-MMC flash card containing TRD binaries formatted with FAT32. The SD-MMC is pre-loaded
with required binaries in its first partition. The pre-loaded binaries include :
o BOOT.BIN
o Image.ub
o autostart.sh
o bin/QtGUI
An USB Micro-B to female Adaptor with USB hub is needed for connecting a keyboard and a
mouse.
An USB mouse and keyboard.
A 4K monitor that supports HD resolutions: 3840x2160p@30 Hz <Other resolutions??>
Note: It is recommended to use ZCU102 production board. TRD binaries have been tested with
<add list of monitors??> display monitor. However, the binaries should work well with any Display
Port-compatible output device provided it supports 4K resolution in its EDID database.
Board Setup
Steps for setting the board
Connect various cables to the ZCU102 board as shown in the below figure.
<Placeholder for ZCU102 board setup>
Connect a 4K monitor to the DP port on ZCU102 using DP 1.2 cable. DP cable version important??
Connect a USB3.0 mouse to the Micro-B USB connector.




Optional: Connect an USB Mini-B cable into the Mini USB port J17 labeled USB UART on the ZC702
board and the USB Type-A cable end into an open USB port on the host PC for UART communications.
Connect the power supply to the ZC702 board. Do not switch the power on.
Insert a SD-MMC memory card, which contains the TRD binaries, into the SD slot on the ZC702 board.
Make sure the switches are set as shown in figure below, which allows the ZC702 board to boot from the
SD-MMC card.
<Place holder for SW DIP switches for SD boot mode>
Run Qt GUI application
A Linux application with Qt-based GUI is provided with the package included on the SD-MMC memory
card. This application provides options to user to exercise different modes of the demonstration. User can
select Test Pattern Generator (TPG) samples or External audio source (requires the XA3 SYSMON
Headphone Adapter card from Faster Technology and an audio source like MP3 player).
User can select to perform FFT computation in APU (run as software code on the PS) or in PL (run in the
FPGA fabric as a hardware IP core).
User can also apply various windowing techniques on input samples before performing FFT.
Powering on the Qt-based GUI application demo


Make sure the monitor is set for DP Ultra HD (4K) resolution.
Turn on power switch SW<#?>.
Note: the Linux image and Qt based GUI application will be loaded from the SD-MMC memory card.


The Linux image will load and be frame buffer console is displayed on the HDMI 1080P monitor.
The Linux Qt based GUI will load
<Place holder for first GUI screenshot>
Running the Qt-based GUI application demo

When the GUI starts up, the demonstration starts with FFT being computed by software running in APU
on samples coming from TPG in PL. You will see that the CPU graph, one CPU is always 100% utilized
and other A53 cores are at a low level of activity. The Full power domain AXI HP port 1 is utilized around
<?Mbps> which is just passing the samples from TPG to PS DDR. The read bandwidth is 0 because TPG
is only writing samples to PS DDR.
<Screenshot>
Exercise different options by pressing the buttons available in the GUI to evaluate the different use cases
mentioned below.
Use case
1
2
Input source
Test Pattern Generator (TPG)
External audio (through XA3 SYSMON Headphone
Adapter card)
For the two use cases mentioned in above table, user can select one of the following compute engines
for FFT computation.
FFT Compute Engine
APU
APU with Neon as Co-processor
Description
FFT computation is done by software running on
APU
FFT computation is done by software running on
APU. The Neon intrinsic APIs are used for FFT
APU controlled PL accelerator
RPU
RPU controlled PL accelerator
computation to make sure instructions are
executed on NEON.
FFT computation is done by FFT IP in PL.
FFT computation is done by software running on
RPU. APU is involved in moving samples from TPG
in PL to PS DDR. Samples from PS DDR are copied
to OCM by APU software and that information is
passed to RPU through OpenAMP channel.
FFT computation is done by PL FFT IP. RPU
controls the AXI DMA transfers to/from PL FFT
core from/to PS DDR. APU is involved in moving
samples from TPG in PL to PS DDR. Samples from
PS DDR are copied to OCM by APU software and
that information is passed to RPU through
OpenAMP channel. PL FFT core fetches samples
from OCM and computes FFT on the samples and
writes samples back to OCM.
User can run the following FFT sizes
FFT Size
4096
8192
16384
32768
65536
User can apply one of the window function on the input samples before FFT computation.
Window function
Hann
Hamming
Blackman
Blackman Harris
User can select the following Frequency Zoom options
FFT Zoom option
ZOOM
NO ZOOM
Description
This is the default option. Selecting this option
fixes the units on frequency axis in the Frequency
domain plot to 512. This enables users to closely
observe the values on frequency axis.
Selecting this option will plot all points on
frequency axis (Number of points equal to half of
the FFT size)
User can select the Voltage scale. This option is important when using external audio source as input.
The voltage of the samples is dependent on the volume of the audio signal. Depending on the
amplitude of the audio samples, the scale can be selected. Available options are:
FFT Scale
1V (Default)
0.5V
0.25V
0.1V
The sampling rate of the SYSMON in PL can be changed on run time. Supported sampling rates are:
Sampling Rate
200 kSPS (default)
100 kSPS
50 kSPS
Note: The sampling rate option is applicable for SYSMON and is visible on the GUI only when Input
source is selected as External Audio source.
The time taken for FFT computation by each engine is plotted on the “FFT computation plot”. The
average computation times are captured for reference in below table:
Computation Engine
APU
APU with Neon as Co-processor
APU controlled PL
RPU
Average computation time (us)
480
350
110
1200
RPU controlled PL
210
Comments
Includes OpenAMP channel
delays
Includes OpenAMP channel
delays
The APU cluster A53 cores utilization is plotted in “CPU Utilization Plot”.
<Figure?>
The bandwidth utilization of Full Power domain and Low power domain high performance ports is
plotted by “PS-PL performance plot”. The write and read throughput is plotted.
<Figure?>
The PL Die temperature is read from the SYSMON and displayed on the GUI.
<Figure?>
Building the Hardware design using Vivado
This section explains how to generate the FPGA hardware bitstream using the Xilinx Vivado tool and how
to export the hardware platform to Xilinx Software Development Kit (XSDK) for software application
development.
Steps for building the FPGA hardware bitstream
1. Launch Vivado project
On Windows 7, select Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado 2016.1
Tcl shell
On Linux, enter Vivado at the command prompt.
NOTE for Windows users: Copy directory 'hardware' that is at '$TRD_HOME/' to a drive directly
because of windows file path limit (255 characters) before following the next steps for building hardware
bitstream. If the design errors out due to the path length limitation, please follow steps mentioned in the
Answer Record.
From the Vivado welcome screen, in TCL console, run following commands
1. cd $ZYNQ_TRD_HOME/hardware/vivado/scripts
2. source ./swaccel_trd.tcl
The above step creates a project ‘swaccel_trd’ as shown in below Figure.
In the Flow Navigator pane on the left-hand side under Program and Debug, click Generate Bitstream >
Yes (shown in below figure).
After the bitstream generation is successful, the user will see a screen as shown in below figure. The
bitstream will be generated at
$TRD_HOME/hardware/vivado/runs/swaccel_trd.runs/impl_1/swaccel_trd.bit
Before exporting the hardware design the implemented design has to be opened. Select Open
Implemented design > OK.
To export Hardware design, click on File > Export > Export Hardware as shown in below Figure.
Select the option, Include bitstream as shown in below Figure.
The SDK hardware platform will be exported
to $TRD_HOME/hardware/vivado/runs/swaccel_trd.sdk/swaccel_trd.hdf
To exit Vivado, click on button X on the top right corner of Vivado IDE. Click on OK to exit.
Petalinux
<Ravi/Pallav, please add the steps here>