poster - FASTER

Automating Resource Optimisation
in Reconfigurable Design
Xinyu Niu, Thomas C.P Chau, Qiwei Jin, Wayne Luk and Qiang liu
{nx210, cpc10, qj04 , wl}@doc.ic.ac.uk [email protected]
Configuration Level
ABSTRACT
• Generate configurations based
on assigned ALAP and ATAP
levels
• New design approach: automatically identify and exploit run-time reconfiguration
for optimising resource utilisation
• Configuration Data Flow Graph: a hierarchical graph structure, for synthesis of
reconfigurable designs in three steps
• Evaluation: barrier option pricing (finance), particle filter (robotics), reverse time
migration (oil and gas)
• Improvement: 1.61 to 2.19 times faster than optimised static FPGA designs, up to
28.8 times faster than optimised CPU reference designs, and 1.55 times faster
than optimised GPU designs; up to 29 times more energy efficient than CPU/GPU
• Optimise configurations to fully
utilise available resources
• Adopt analysis of resource
consumption and bandwidth in
the design model
Motivating Example
Partition Level
n
2n
3n
4n
n
2n
7n/3
8n/3
• Group Configurations into partitions
by mapping configurations into a
Configuration Graph
• Assign design rules to build the
search space
• Generate valid partitions within the
pruned search space by recursive
search algorithm
• Static design: accommodate all functions to accomplish the application
• Dynamic design: reconfigure design dynamically, implement only active functions
Design Flow
Results
Design issues are addressed
at multiple levels:
• Function level: extract
function
information,
assign data dependency
levels
• Configuration
level:
generate
configuration
based on designs rules,
optimise configurations
• Partition level: generate
partitions
recursively,
select the optimal runtime solutions
Function Level
Algorithm level
Function
properties extracted
from algorithm details:
• resource consumption,
• bandwidth requirement,
• memory architectures
• data dependency
RESEARCH POSTER PRESENTATION DESIGN © 2011
www.PosterPresentations.com
Function level
Function nodes are assigned:
• As Late As Possible (ALAP) levels
based on interactions between
function nodes,
• As Timely As Possible (ATAP) levels
based on data dependency inside a
function
Benchmarks:
• Barrier Option Pricing (BOP)
• Particle Filter (PF)
• Reverse-Time Migration (RTM)
Exploring original
search space
Exploring pruned
search space
Scalable design method:
• Design rules to eliminate redundant search space
• Design algorithm to only explore valid design space
Improvement over static designs:
• FPGA devices: 4 Xilinx Virtex-6 SX475T FPGAs, hosted in a Maxeler MPC-C500
computing node, running at 100MHz
• 1.94, 2.19 and 1.61 times faster respectively for barrier option pricing, particle
filter, reverse time migration
• Static design: 34% to 59% of theoretical performance
• Reconfiguration removing idle functions: 98% of theoretical performance
• Inefficiency for dynamic design: due to reconfiguration overhead
Improvement over optimised CPU and GPU designs:
• CPU: 24 Intel Xeon X5660 cores running at 2.67 GHz
• GPU: an NVIDIA Tesla C2070 card running at 1.15 GHz, linearly scaled by 4 for
comparison with multi-FPGA designs
• Up to 27 times faster than CPU, 1.55 times faster than GPU
• Up to 29 times more energy efficient than CPU and GPU designs
• Proposed approach applicable to multi-chip environment
• Scalability: limited by reconfiguration overhead, can overcome by parallel
reconfiguration of multiple FPGAs
Acknowledgement:
This work was supported in part by UK EPSRC, by the European Union Seventh
Framework Programme under Grant agreement number 257906, 287804 and 318521,
by the HiPEAC NoE, by Maxeler University Programme, and by Xilinx.