An Integrated Debugging Environment for Reprogrammable Hardware Systems Kevin Camera Hayden So Bob Brodersen Berkeley Wireless Research Center University of California, Berkeley AADEBUG 2005 Outline Motivation Existing platform Existing design/verification flow Proposed solution Environment features Walkthrough Implementation strategy 1 Application Domain Direct-mapped, reprogrammable hardware systems FPGA-based signal processing and supercomputing arrays 2 FPGA Computing Benefits Superior power, computation, and cost efficiency than any processor-based solution, due to direct mapping of algorithms XC2VP70-7 C6415T-1G Computation Rate (Gop/s) 72 4 Power Efficiency (Gop/s/W) 2.72 1.84 Price/Performance (Mop/s/$) 31.0 14.81 Chang, Wawrzynek, Brodersen; ISCA ‘05 3 BEE2: 2nd Berkeley Emulation Engine (5) Xilinx V2P100 per board ~100K logic cells 2 PowerPC405 cores 444 dedicated multipliers 1MB on-chip SRAM 3.125Gb/s duplex links (4) DDR2 banks per FPGA 4 72 bits per bank with ECC Up to 12.8 (DDR400) or 17 (533DDR) GB/s bandwidth Up to 4GB capacity BEE Design Flow Design Netlist Place and Route Verify Hardware Design entry is in the Matlab/Simulink environment Graphical, library based; also allows custom HDL Typical FPGA path to physical implementation HDL synthesis and place and route Hierarchy is flattened in each pass (non-modular flow) 5 Design Verification Methods High-level functional simulation HDL/RTL simulation Native FPGA execution Complexity, Accuracy 6 High-level Functional Simulation Design execution in Matlab/Simulink Intended to be correct by construction Fastest softwarebased simulation Powerful and convenient algorithm exploration 7 Drawbacks of High-level Simulation sec 50 45 38.8 Even with high level of abstraction, vastly slower than hardware 40 35 30 25 20 15 10 5 Doesn’t cover any sideeffects or requirements of the backend tool chain 2E-06 0 HW Trend is worsening with increased FPGA capacity SW 8 HDL/RTL Simulation Varying levels of accuracy Access to arbitrary internal signals But, simulation speed is even slower Parameterization/Iteration is much harder 9 Native FPGA Execution Runs at full speed of hardware Three tools for on-FPGA testing: Xilinx ChipScope Pro System Good Generator HW-in-the-loop old-fashioned signal probing 10 Xilinx ChipScope Pro 11 Inserts BRAM cores into design and binds to JTAG Captures selected signals and provides trigger conditions Signals of interest must be chosen in advance Captured state is limited by available BRAM Any changes require tool flow re-iteration System Generator HW-in-the-loop 12 Allows hardware itself to accept and process data from Simulink via JTAG Arbitrary number of data elements can be accessed as “ports” Very powerful tool, but features limited process control Hands-on Hardware Debugging Most accurate method for finding timing-related bugs in a “production” system Tradeoffs are all too well-known: 13 Complex equipment Limited probing pins A priori signal output Limited input options Drawback of On-FPGA Execution 100 Place and route time is a major bottleneck 90 80 70 Complete run is needed for every design change min 60 50 40 30 20 Increasingly problematic due to larger FPGA capacity Synthesis 10 0 Place and Route 14 PFB (3805) PFB x4 (15,301) PFB x8 (30,601) 0.28333 3.85 1.26667 35.4 3.4 90.46667 Proposed Solution Enable extensive debugging and design exploration functionality directly on the hardware platform Vastly superior execution time for today’s large-scale computing challenges Exploit the spatial resources of the hardware to assist in debugging Essentially a -g switch to the hardware design flow Minimize or eliminate iterations through implementation flow 15 Caveats Final timing of design will not be preserved Critical path will definitely be increased, but 106 is a lot of headroom Timing-driven implementation still needed once verification is complete Significantly more FPGA capacity and memory will be needed Acceptable for scalable BEE-like platforms and for modular, tiled algorithms 16 Essential Features of Environment 1. Robustly parameterized library components with soft configuration Design exploration without tool iterations 2. Readily accessible variable contents Reading and writing of any values by user 3. Complete user-driven control over process execution Single-step, bursts, breakpoints, assertions 17 1: Parameterized Library • Number of bits • Saturate / Wrap • Binary point position • Microarchitecture Library components provide configuration parameters as inputs, which can be set by variables Allows runtime modification of function properties, including precision, range, and latency Enables design-space exploration at hardware speed, plus correction of configuration errors without reimplementation 18 2: Data Management Ability to dynamically observe any variable’s value at the user’s request Ability to overwrite a variable’s value at runtime and continue operation Ability to rewind system state within the bounds of buffer capacity 19 2: Data Management Requirements Too expensive to re-implement the hardware to expose new data All variables are streamed into local and offchip storage, such as DRAM and disks Unlike software, hardware is highly parallel, and often deeply pipelined Memory requirements could be extreme Can be offset by hierarchical memory architecture and/or periodic sampling 20 3: Process Control Inherit the most useful features of software debuggers like GDB Cycle-by-cycle (single-step) execution Breakpoints (either state dependent, or fixed cycle count) Implemented using multiple clock domains and clock buffer control Already available for use on BEE2 21 Walkthrough: Design 22 Use specialized libraries to provide soft configuration Integrates directly into the existing BEE2 tool flow Walkthrough: Tagging User tags signals of interest with debugging testpoints Defines a variable name Defines other parameters of interest for data observation 23 Also includes breakpoints and assertions Walkthrough: Stitching “Stitcher” updates the design before entering back-end tool flow Inserts logic as needed for debug functions Instantiates PowerPC core and master controller Adds underlying connections to route data 24 Walkthrough: Runtime User can monitor variables and control process execution from remote client Embedded PowerPC software provides a thin service layer Client is fully integrated with Matlab and Simulink input description 25 Control Architecture on BEE2 Control FPGA Network PPC Singlestep Clock Buffer Logic 100MHz User Defined (~1-10MHz) Clock domains Breakpoint interrupt Control DRAM Inserted Logic User Design User FPGA 26 Stitching Stitcher traverses the design hierarchy and: Replaces debugging component placeholders with necessary logic Creates a simple route from all variables to off-chip storage devices During execution, the stitcher records: A mapping between variable names and their physical variable unit in hardware The latency within the variable routing network 27 Variable Control Unit (VCU) Inserted in place of each variable block in design Automatically implied for every state variable in a state machine Combination of local buffers and off-chip DRAM Exact memory allocation is subject to experimentation 28 Debug Controller (DC) Interface between all variable and assertion instances, the runtime user shell, and process control “services” Regulates the system clock both for exceptions and to prevent variable storage overflows 29 Runtime Shell Examples load Initialize or reset a design halt Stop the design as soon as possible runfor Run the design for a number of cycles cont Run the design until the next exception break View, enable, or change a breakpoint view View a variable’s value or history set Override a variable’s value or source rewind Rewind the system state by n cycles 30 Future Work Complete infrastructure for BEE2 Extensive experiments with variable memory Efficient methods for variable routing Storage requirements and hierarchy Time/Space tradeoffs for periodic sampling Generalize framework to define concepts such as variable priorities, multiple debug levels, and extensions to text-based languages 31 Questions?
© Copyright 2026 Paperzz