Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010 No single architecture solves all power problems 10000 1000 General Purpose Processor 10 X mW/MIPS 100 10 Software Programmable DSP 1 100 X Hard -wired proxy 0.1 0.01 0.001 1980 1985 1990 1995 2000 2005 2010 •Industry has debated merits of each architecture for 2 Retargetable Compilation • Why ? • Rocket – C compiler, written in C++ – Retargetable for ILP computers – Single machine description file – Development 1989-2000 • Gnu Hybrid Computing • Heterogeneous processors on single chip – “CPU” – FPGA – ASIC – N “CPU”s, M FPGAs, K ASICs • Tradeoffs of performance, power, flexibility Generic Hybrid Architecture FPGA 1 CPU 1 FPGA 2 CPU 2 Shared Memory CPU m Multi-CPU FPGA n Multi-FPGA Generic Hy-C Tools Source Code Objectives/Constraints System Specification Partitioning CPU Compiler FPGA Synthesis CPU Power-Performance Model FPGA Power-Performance Model Optimization Control Intermediate Representations • 3-address form • Control flow graph • SSA --- static single assignment Control Flow Graph • Nodes are Basic Blocks – Single entry, single exit – No branch exempt (possibly) at bottom • Edges represent one possible flow of execution between two basic blocks • Whole CFG represents a function Static Single Assignment • SSA: A program is in SSA form iff – Each variable is statically defined exactly only once, and – Each use of a variable is dominated by that variable’s definition. 7/29/2017 9 Example X1 = X2 = • In general, how to transform an arbitrary program into SSA form? • Does the definition of X2 dominates its use in the example? X3 = (X1, X2) X4 = 7/29/2017 10 SSA: Motivation • Provide a uniform basis of an IR to solve a wide range of classical dataflow problems • Encode both dataflow and control flow information • A SSA form can be constructed and maintained efficiently • Its popular • Gcc uses SSA 7/29/2017 11 Software Pipelining • Schedule operations from multiple iterations of a loop in parallel • Hides latency • Compiler “reorders” loop code to include: – Prelude – Kernel – Postlude Software Pipeline Benefit for “Typical” Architecture and MMult • “Typical” Architecture – 8-wide Instruction-Level Parallel (ILP) • Assuming 3000 x 3000 matrices – Original requires 45 million cycles – Pipelined version requires 3 million + 15 Current Compiler Projects • Hy-C – Build tools – Partition algorithms – Retargetability and constraint specification – OMAP project • Thread-level parallelism in imperative code – Limit study – Improved identification of threads • Fast compiler-controlled memory OMAP4 Sub-System Encapsulation OMAP4 Application Chiron (2xCortex-A9) Camera Control Apps / Frameworks OMX Image ISS Camera Imaging IPC Distributed OpenMAX Image HWA IPC GFX OpenGL HDMI HLOS Storage LCD USB I/O & Peripherals Audio Back-End Audio 15 Ducati Displays DSP/BIOS C64x Tesla IVA HD Video OMX Audio 3P extensions IPC DSS Video HWA Power Analog TV OMX Video Programm -able Image/ Video DSP/BIOS 15 OMAP Resources Chiron Tesla Shared Memory Ducati Multi-CPU OMAP Processor Resources • Chiron – 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) – Power 600uW / MHz • Tesla – DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP – Power 200uW / MHz • Ducati – 200 MHz (targeted for control, low latency code) – Power 100uW / MHz Hy-C for OMAP Source Code Objectives/Constraints System Specification Partitioning Veyron Ducati Tesla Optimization Control OMAP Project, Current State • Use gcc to generate “readable” SSA graphs for C programs • Developing translator to convert SSA graphs to Hy-C internal Control, Data Dependence Graphs (CDDGs). • Translator to Hy-C CDDGs successfully tested on small C programs 7/29/2017 Partition Algorithm • Examine Control Flow Graph (CFG) for a function – Identify software pipelining possibility – Build Dependence Graph (combining data and control dependence) • Choose one of three resources for the function Partition Algorithm (cont.) • If software pipelining profitable, place function on C64 DSP resource • Else examine Dependence Graph – if ( number of nodes / critical path length ) > 1.5, place on double-issue ARM – else place on single-issue ARM Long-Term Future • Automatic Code Generation (I don’t believe in software) • Visual Programming of Components
© Copyright 2026 Paperzz